Pages

2021/01/11

Advent of Code 2020 - Day 18

This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.

This problem is basically evaluating expressions with varying operator precedence rules. It screams abstract syntax trees (ASTs), but I am a little out of my depth there, not my area of expertise. I'm sure that there are more elegant ways to solve this, potentially using the , but I decided to build quite an ad-hoc approximation.

Part 1

Addition and multiplication have the same precedence for this part, operations are evaluated left-to-right, and parenthesis remain highest priority as usual.

We will begin building our simplified version of an AST by defining a Node class and a Leaf class. The Nodes will have an operation to perform and left and right attributes pointing to the arguments (all operators are binary). We use a property to recursively evaluate the operation. The Leaves just contain values, and offer the same property as the Nodes to return the value (this is the base case for the recursion).

With this, we only need to build the AST from the literal expression. We will do it in two steps: tokenizing and parsing. The former will convert the expression into its semantic components (tokens), while the latter will build the AST from the sequence of tokens.

We are actually complicating the tokenization a little bit by considering multi-digit numbers (after looking at the input, all numbers are single-digit), but it's not that bad, so we'll keep it.

Basically, we are just looking at each character, and returning the appropriate symbol for it: if it's a digit, convert it to a number and wrap it up as a Leaf; if it's a paren, it's its own symbol; if it's an operator, return the corresponding Node (without linking left and right for now).

To account for multi-digit numbers, we just keep appending digits to a number as long as we get digits, and when we get a paren or an operator we convert the accumulated number and reset the accumulator, besides producing the paren or operator symbol.

Since we will evaluate left-to-right, we will reverse the sequence of symbols. This allows for easier parsing; grouping the operations as they come will group them properly in this way, considering a recursive parsing, which is easier to grasp: the first operation is applying the rightmost operator on the rightmost value and the subtree resulting from parsing everything to the left of the operator (leaving parens out of the explanation for clarity).

So we can take this explanation, and use it as the template for the recursive parsing (this time including the parens):

  1. Take the first token. If it's a Leaf, it becomes the root of the AST, if it's a closing paren the root is the (recursive) parsing of the following tokens, as we will return from the function when reaching an opening paren. The roles of opening and closing parens are intentionally inverted due to our inversion of the token stream.
  2. If there are no more tokens, or the next token is an opening paren (which, remember, is actually closing the parenthesized sub-expression), return the AST (the current root).
  3. Take the operator (the only remaining option for the next token) and assign the root to its right, then make the operator node the new root. Finally, assign to its left the (recursive) parsing of the rest of the sequence.
Evaluation of the expression is straightforward: tokenize, parse, return the value of the AST.

Part 2

In the first part we didn't have to look at the operators as both of them had the same precedence, so we just applied them as they came. For the second part addition takes precedence over multiplication (otherwise it would be too easy, as we could use the built-in eval function).

We will change strategy slightly now. We start by pre-parsing the token stream and making each parenthesized sub-expression a list (and remove the parens symbols). This is accomplished quite easily with a recursive implementation: tokens are kept as-is, starting a paren group gets a recursive call, end when reaching the end of a paren group or the end of the token stream.

The new parsing will recursively parse parenthesized sub-expressions, so that we always apply operations to flattened lists of Leaves and Nodes, but we need to consider that the Nodes may be tree roots already (from a parenthesized sub-expression). This means that we first parse the sub-expressions, then apply all the addition operators, then apply the multiplications, and we are left with the resulting tree. Since we will be doing all of this accumulating on a list, the result is a single-element list that we need to pop.

The application of the operators is the most delicate part of the exercise. Unlike in the first part, we cannot assume that any Node is an unbound operator, as there will be multiple passes, and parenthesized sub-expressions will be processed before. So, to check if a Node is the type of operator that we want to apply we need to check that it's the right type and also that its right attribute is not yet assigned. We define the isop function to check this throughout.

The first token goes directly into the sequence that will be returned. After that we have two possibilities:

  • The new token is the type of operator we want to apply. If the latest token in the sequence is also of this type (which implies not having its right assigned) we assign the new token to its right (this is just for symmetry, this should never happen, and would actually leave the right of the new token unassigned and unassignable). In the normal case, we just pop the last token from the list, assign it to the left of the new one, and put the new one in the list.
  • The new token is not an operator of interest. We just add it to the list, unless the currently last token is an operator, in which case this goes to its right.
What we are doing effectively is taking every operator of the specified type that hasn't been processed yet and connecting it to their right and left, taking care that there may be other operators around.

We now have all steps: tokenize (same as part 1), apply parens, apply the new parsing with multiple passes for the operators. Finally get the value of the resulting AST.

One interesting thing to note is the usefulness of iterators for all of the recursive implementations: when you pass the iterator to a recursive call it will keep drawing from the point where it was, rather than restart (which would happen with an array or list). This removes the need to keep track of the position across calls, which would make everything more cluttered (the additional variable in the function, in the signature so it can be passed in, and as a return value to know where to continue when the recursion returns).

No comments:

Post a Comment