Pages

2020/12/08

Advent of Code 2020 - Day 6

 This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

This puzzle builds a little on previous ones. The input parsing requires building groups, separated by blank lines, and we already did that for Day 4. In this case each line will be a set, and the group a list of those sets (the simplest to build the groups). Apply the same caution as before with potential empty groups.

Part 1

We have to count the number of distinct elements in each group, then add the all groups together. Or, in other words, for each group the cardinality of the union of the sets in the group. We just build a quick loop that emulates a reduce operation applying set union, starting with one of the sets in the group.

That gives us the count or each group, we then add them together.

Part 2

Now we are required to count the elements present in all sets in the group instead of all elements present in all sets, or the intersection of the sets. We repeat the procedure above applying the intersection instead of the union for the reduction, and done.

Advent of Code 2020 - Day 5

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

This one is a breeze with the right idea. The common part is converting the seats into the numerical IDs. At first sight, it looks like a bisection scheme would fit the description perfectly: keep track of the upper and lower bound, and depending on the letter keep the high or low half-interval until it is reduced to a single number. Do that for the row and repeat for the column. It's not even hard to do, but it can be simplified.

Instead we can notice that the scheme it describes corresponds exactly to decimal encoded binary: each position encodes this half-interval concept by adding or not the corresponding power of two. So the first of the seven digits encoding the row leaves the number encoded with the other six bits in its [0, 63] range if it's 0, or moves it into the [64, 127] range, and so on.

Furthermore, multiplying by 8 is equivalent to shifting three positions to the left; then adding the column part just fills in those three positions. If we take the seat as given, and substitute the letters with 0s (for F, L) or 1s (for B, R), we only need to interpret that as a base-2 int to get the ID.

Part 1

Once we have the IDs, finding the maximum is trivial using the max built-in.

Part 2

Now we need to find the gap in the consecutive IDs, but the start and end are not known.

In reality we know the end, that was part 1. We can calculate the minimum in the same way, then go from minimum to maximum until we find the gap. That would require building a set of IDs to check each value in constant time. And it would be the most efficient approach (linear time to build the set, get minimum and maximum, and then traverse the range). The latest code does this with set operations. The resulting set should contain just one element, per the problem statement.

In the first commit for the day, I did it differently, in O(n log(n)),  by sorting the IDs, the traversing the sorted array looking at each value and the next. When the difference is greater than one, that's our gap.

Advent of Code 2020 - Day 4

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

Another one for parsing and validating.

Part 1

The parsing part is a little more complex than what we have seen so far, as we can no longer assume that a record corresponds to a line. Other than that, it's just colon-separated key, value pairs. So instead of building a record in one go, we start with an empty one, adding key, value pairs as they come until reaching a newline, which marks a record switch.

Two potential pitfalls to look out for. First, remember to add the record still being built at the end of the loop. Second, both at this step, and whenever closing a record and opening a new one, check whether the record has any contents; two consecutive blank lines or a blank line at the end of the file would introduce an empty record otherwise.

The validation in this part just needs to check if a record contains all required fields; once again generator expressions make this condition sound almost as natural language:

all(field in record for field in fields)

Then just go through all the records and count those that are valid.

Part 2

This part makes the validation a little bit more involved by specifying requirements for the values in each field. Just validate each.

Some are straightforward, especially after converting the value into a number. Others require a little more work. Some points of interest:

  • The sets defined at the top of the module allow a quick check of some fields. In particular, the pid field needs to be checked like this to ensure that it has leading zeroes to pad up to 9 digits.
  • Height needs to be treated with care, as the units are not always present, and assuming so may leave an empty string for the numeric part; therefore we require at least four characters, or return there as invalid. After that, it's easy.

Advent of Code 2020 - Day 3

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

A quick and easy one. This exercise consists on traversing a field defined by the input file following a given pattern, and counting how many of the positions traversed contain a '#' character (a tree in the puzzle). The pattern is given by a pair (r, c), meaning that each step advances r rows and c columns, with columns wrapping over if the width is exceeded.

Part 1

We are given a pattern, and we have to count the trees. Parsing the input is trivial and doesn't merit additional explanation.

The traversal itself is not especially complex, either. We update row and column with each step, using modulo arithmetic to easily wrap the columns around, and check if the new position is a tree. We end once we run out of rows.

Part 2

Rinse and repeat, with several patterns. If you were careful to use the condition row < height rather than row != height when checking the stopping condition everything will work, otherwise, some patterns may skip the row where row == height and never end.

Advent of Code 2020 - Day 2

 This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

Still on the warm-up phase, but we are already bringing in regular expressions. The patterns are easy enough to solve with basic string methods, but the regular expressions make it much clearer.

Part 1

We get a file in which each line is a constraint specification followed by a word, and we are asked to count how many lines are consistent (i.e. the word is valid according to the constraint). Each line is of the form:

n-m x: word

Here, n and m are integer numbers, and x is any single letter. The constraint is that the letter given by x must appear between n and m times in word.

So we have two tasks:

  1. Parse the file into paired costraint specifications and words
  2. Evaluate validity of each pair
The first task is straightforward matching on each line the regular expression:

(?P<min>\d+)-(?P<max>\d+) (?P<letter>[a-z]): (?P<pwd>[a-z]+)

We can then use the named groups to retrieve each relevant piece of information (and cast min and max to ints). With this small modification we can directly use the dictionary that we get from match.groupdict() as the specification.

Validation is equally straightforward. We can use a generator expression to compress counting the number of times the letter appears in the word while making it more readable than the equivalent loop:

sum(1 for c in spec['pwd'] if c == spec['letter'])

Compare it with:

total = 0
for c in spec['pwd']:
    if c == spec['letter']:
        total += 1

Part 2

For this part the validation rules change, so we keep the same parsing. The new validation rule is that the letter must appear in exactly one of the two given positions (interpreting n and m as 1-based indices).

The first gotcha is the 1-based indices. Not difficult, but easy to forget. Otherwise the validation seems to be quite direct as a XOR of the conditions that the letter at each of the two positions be the given letter:

(pwd[pos1] == letter) != (pwd[pos2] == letter)

However, this can lead to out-of-bounds errors, so there is some accounting to do before reaching this. If the word is so short that the first position is not even included, then it is not valid; if only the first position is within the word, the word is valid if the letter at that position is the given letter. If both are included, we fall back to the XOR.

Advent of Code 2020 - Day 1

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly. 

The first day is just a warm-up exercise, but still there are some subtleties, specifically in the complexity of possible solutions.

Part 1

The first part reduces to finding a pair of numbers in a list that add up to 2020. I checked the list for duplicates, and there are none, which makes the use of sets a viable option. We will see why this is important in a moment.

The naive approach would be to check each possible pair, which would result in quadratic complexity, not bad, but this can be easily achieved in linear time. By using sets, so that membership checks are constant time, otherwise these would be linear time and we would be back to quadratic.

The solution is taking each value n in the set, and checking whether target - n is in the set. If it does, we have our answer. We return it as n * (target - n) as the requested answer is the product of the numbers.

This requires two linear operations: building the set, and running through it once checking each number (the check is constant time, as mentioned above).

Part 2

The next exercise is finding three numbers that add up to 2020. If we hadn't gone for a linear solution in the first part, we would be looking at a cubic time solution for this one. As it is, we can manage to limit it to quadratic time, by checking for each number n whether there are two numbers in the rest of the set that add up to target - n, and this can be done using the approach from Part 1.

Of course, in all the discussion complexities are worst case. The algorithms will end as soon as the right answer is found, which can happen with the first numbers that it evaluates. In the worst case, it will try all potential combinations to determine that there isn't one that meets the requirements. In that case, the return value will be None.

Advent of Code 2020

 I have followed the Advent of Code site for several years now. I don't usually have the time to fully finish it, as the time required for the latest and hardest exercises becomes more than I can spare for it. But who knows? This year might be different. It has definitely been a very different year so far.

As an additional exercise, I will keep a log of the solutions here; one post per day. The code will be in this GitHub repo, if you want to follow it. The posts will deal with the rationale behind the solutions and the choices made.

There are some general aspects of the repo, that may be relevant to highlight. First, the exercises for each day are in a puzzleXX.py file, where XX is the corresponding day. The puzzle input if needed is in a file with the same name and .in extension. In some cases the puzzle gives a smaller example that is useful for testing, and those have a .in.test extension; since each puzzle has a read_input() method, changing the file that is read there will switch between the full and test input. Remember that the input files are personalized, so using mine will not give you the right answer for you, you'll need to substitute your own input files.

The different aoc.* scripts help in removing boilerplate. aoc.py allows to run specific puzzles and parts, with aoc and aoc.bat being thin wrappers (for Linux and Windows, respectively) so that part p of day d can be run with [./]aoc d p. All puzzles return the expected answer so it can be copied directly.

I will link to the posts for each day below, or you can use the aoc2020 tag to filter them.

Links to posts: