Applied Code Reading: Debugging FreeBSD Regex
When the code we're trying to read is inscrutable, inserting print statements and running various test cases can be two invaluable tools. Earlier today I fixed a tricky problem in the FreeBSD regular expression library. The code, originally written by Henry Spencer in the early 1990s, is by far the most complex I've ever encountered. It implements sophisticated algorithms with minimal commenting. Also, to avoid code repetition and increase efficiency, the 1200 line long main part of the regular expression execution engine is included in the compiled C code three times after modifying various macros to adjust the code's behavior: the first time the code targets small expressions and operates with bit masks on long integers, the second time the code handles larger expressions by storing its data in arrays, and the third time the code is also adjusted to handle multibyte characters. Here is how I used test data and print statements to locate and fix the problem.
Continue reading "Applied Code Reading: Debugging FreeBSD Regex"