This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.
This problem consists of the validation of messages according to a set of nested rules. The simplest are just a string, others are a sequence of rules (which is to be interpreted as the message being partitionable into substrings that match exactly those rules in that order), and the more complex ones include multiple alternative sequences (i.e. at least one of those sequences must match).
Since the numbers of rules is not too large, we will attempt a simple approach to begin with, and refine it if needed. Let's build all potential matching patterns for each rule.
Part 1
We'll start by building a Rule class. It will hold the patterns (once evaluated), a rule 'specification', and a pointer to the set (actually a dictionary) of rules, so it can refer to other rules when evaluating the patterns. It will also provide the overall behaviour, like creating the Rule from the textual specification given in the input, accessing the patterns, and checking for a match. Then we will have subclasses that specialize the evaluation of patterns according to the three types of rules described above.
Reading the input is quite straightforward: in each line the colon separates the key (an int identifying the rule) and the specification fed to the Rule constructor. The resulting rule will be the corresponding value in the ruleset dictionary.
The Rule constructor is a class method that will decide which subclass to use depending on the specification. Each subclass has a different representation of its specification:
- Leaf Rules just have the string they match (stripped from the quotation marks)
- Sequence Rules have a list of the identifiers of the rules they concatenate
- Alternative Rules have a list whose elements (the alternatives) are in turn lists like those in Sequence Rules
Leaf Rules generate just one pattern, the string they match.
For Sequence Rules, we go back to itertools.product to generate the Cartesian product of patterns matched by each rule in the sequence.
For Alternative Rules, we apply the same procedure as for Sequence Rules on each alternative, and join those together (actually, we could just have Alternative Rules, where Sequence Rules would have just one alternative).
Once all rules are built, we take rule zero, which is the one we need to check, and build its patterns (this is done implicitly in the first match, and cached for subsequent ones; it will recursively build the patterns of all rules that are referenced from it). Then we check the match against the messages.
The matching is trivial: is the message in the set of matching messages?
Part 2
This part introduces loops in two of the rules. Handling the general case would be complex, but we can examine the specific rules given here and notice what they mean.
Our root rule is (8, 11); rule 8 is (42 | 42, 8); rule 11 is (42, 31 | 42, 11, 31). So rule 8 is one or more 42s, and rule 11, though a little harder to see, is one or more 42s followed by the same number of 31s. So if we take this rule 11, and prepend to it at least one 42, our root rule becomes one or more 42s followed by one or more 31s, with strictly fewer 31s than 42s.
Since everything else is the same, we can look at the results of part 1 to see that all patterns for 42 and 31 are of the same length, so we can divide the message in chunks of that length. The it's a matter of checking that we have one or more chunks that match 42, that they cover more than half of the message, and that the rest match 31.
For additional efficiency, we can discard any message that doesn't divide neatly into these chunks.
No comments:
Post a Comment