Pages

2020/12/09

Advent of Code 2020 - Day 7

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

This puzzle starts to ramp up the difficulty. We are asked to work with a set of rules for suitcases. From the sample input, it seems that each type of suitcase, based on colour, must contain certain numbers of other two types, or no other suitcases. But checking the full input we can see that the number of types need not be that limited, so we will assume an arbitrary number when parsing.

The parsing is relatively simple, as the input format is heavily structured, allowing us to split on the word 'contains' to separate container from contents and then further separate the contents into individual types by splitting on commas. We can check the special case of no other types contained when only one block of contents is present, and then use a couple of regular expressions to extract the type of container and the types and numbers of contents.

We store the rules as a dictionary, for quick reference. For each type as key, the value is None if it can contain no other types, or a dictionary of type, number pairs.

Part 1

The question for part 1 is which types can contain, even if several levels deep, a specific type. If we assemble the rules as a tree, we only need to find the node for the target type and report the intervening nodes from the root. But that assumes that the combination of the rules will respect a tree structure.

Therefore, we will not make the assumption, and consider still a graph structure. We can ignore the numbers for this part, as we only need to check connectivity. If we invert the rules, making the edges going from content to container, we can just check which nodes are reachable from the node corresponding to the target type. We will call this the 'containment graph'.

A quick and dirty way to represent such a graph is a dictionary where the types are the nodes, represented as keys, and the corresponding values are sets of adjacent nodes, in this case the types that can contain it. We use a defaultdict(set) so that we can start with an empty graph and just add edges; the data structure will take care of creating an empty set the first time we add an edge associated to a node.

Once we have the graph, it's easy to evaluate the reachable nodes from our target type. We just start from the target, add all adjacent nodes and recursively do the same to each of them. In practice, we use a queue containing initially the target node. As long as we have elements to process in the queue, we take one, add it to the reachable set, and add all its successors that we haven't yet visited to the queue; once we have visited a node, we are guaranteed to reach all nodes reachable from it without revisiting it, and without this precaution we could end up in an infinite loop if there are cycles in the graph.

Part 2

The second part asks us how many suitcases we will need to have inside of our target type by applying the rules. This is conceptually simple using a recursive solution: the count of total suitcases for a type is 1 (itself) plus the sum of the counts for the types it must contain, each times the corresponding number in the rule. Those counts are recursive calls to this same calculation, and the base case corresponds to types that do not contain other types.

In practice it's necessary to be more careful, as this is multiply recursive and that can pose problems, as anyone who has tried a direct recursive implementation of the Fibonacci sequence can attest. Seriously, if you haven't, build a naive recursive implementation, see how times increase as you move up, and whether you are patient enough for fib(50). Here, you don't even have to write it:

def fib(n): 
    if n == 0 or n == 1:
        return 1
    return fib(n - 2)  + fib(n - 1)

To work around this, we will cache results as we reach them. We include an accumulator (acc), a dictionary that will hold known counts. Instead of going directly to calculating the result, in each call we first check the cache, use the cached value if it exists, and otherwise perform the calculation and store it into the cache.

Finally, when we get the count, we need to remove the initial target type suitcase, as the question is how many suitcases are required inside it!

If you're curious, the equivalent trick for the Fibonacci sequence would be:

def fib(n, acc=None):
    if acc is None:
        acc = {0: 1, 1: 1}
    if n not in acc:
        acc[n] = fib(n - 2, acc) + fib(n - 1, acc)
    return acc[n]

But for this you'd actually be better off just using the lru_cache decorator from the functools module on the naive version, just saying. Try either, and now you can know fib(50).

No comments:

Post a Comment