This is a running series of posts on the Advent of Code 2020. The introductory post has the details, including the accompanying GitHub repo, in case you reached this directly.
Today's problem is very interesting, even if it looks simple. We start with a set of values, which we read into a list as usual: strip, split on comma, cast to int.
Part 1
We are asked to find the value that minimizes the sum of distances to the given values. This is a well-known problem in optimization, and the answer is the median of the points. In one dimension there is little to think about; for higher-dimensional spaces, the result depends on the norm used.
The median is easily calculated: sort the values and take the value in the central position of the ordered array (same number of elements above and below). For an even number of items, take the midpoint of the two central values.
Part 2
Now we change the definition of the distance, and for an absolute difference of n we have a distance of 1 + ... + n. The first thing we note is that this is the harmonic series, and the sum is known to be n*(n+1)/2, which will be an integer as either n or n+1 will be even. That will save some looping.
With this new distance, the median is no longer a valid answer. Instead we will run an optimization algorithm. In this case, a modified bisection. Since the distance behaves like a norm, the optimization problem of minimizing the sum of distances is convex, and the derivative (with respect to the selected point) will be increasing, starting negative and ending positive.
So we are looking for a point that is lower in this sum of distances than the points surrounding it on either side. We set the high and low limits at the extremes of the points, and look at the midpoint of that. If the points to either side are both larger in sum of distances, we have our answer. If the sum increases (lower on the left, higher on the right), we are on the far side of the objective function, so we drop the high limit to the midpoint. If it's the contrary, we raise the low limit to the midpoint. We repeat this until we find the point we look for.
Since the span of the interval is halved at each step, this is guaranteed to happen in log2(N) steps for an initial range span of N.