As a general rule, I find I’m not happy with a software project until at least the third ground-up rewrite. Rewriting is usually discouraged in business, and not without some good reason. But in my spare time, I often find myself happily refactoring my code, each design better than the last, but never quite getting any further before I spot some flaw that inspires another redesign.
Haskell seems to encourage this constant honing, not just for complex modules or sophisticated libraries, but even for the smallest function. It’s refactoring in miniature; a sort of code bonsai.
I was recently trying to create a netstrings implementation for Haskell, and I started off with a function that would strip the first length indicator off. So for instance, “5:hello,” would become (5, “hello,”)
readLength :: String -> (Int, String) readLength s = (read n, tail s') where (n, s') = break (== ':') s
This was my first attempt. Did it work? Yep! Could I do better? Maybe…
I recently came across a blog post that demonstrated how useful arrows could be, and it seemed like they could be used here as well. In the above case, I’m applying read and tail to the first and second arguments of a tuple, but the arrows library provides a function that does that already:
(***) :: (b -> c) -> (b' -> c') -> (b, b') -> (c, c')
Refactoring my code to use the Control.Arrow resulted in a rather lovely piece of code:
import Control.Arrow readLength :: String -> (Int, String) readLength = break (== ':') >>> read *** tail
So concise! So succinct! So devoid of error checking… As beautiful as that one line of code was, it was just going to get messier when I started putting in conditions to ensure that the data was formatted correctly, and adding all of the other functions needed to read a stream of netstrings. What I needed was something that could interpret a piece of formatted data… something like a parser…
I honestly don’t know why it took me so long to think of using Parsec, but once I did the task suddenly became trivial, and I bashed out the entire netstring parser in almost no time at all:
import Text.ParserCombinators.Parsec number :: Parser Int number = do d <- many1 digit return $ read d netString :: Parser String netString = do n <- number char ':' s <- count n anyChar char ',' return s netStrings :: Parser [String] netStrings = do ns <- many netString eof return ns
It doesn’t use arrows, but it’s clear, concise, and will throw a useful parse error when it comes across a problem. The only downside with using Parsec is that it doesn’t have ByteString support, so the efficiency of this implementation is likely to suffer under heavy traffic. Luckily, however, there’s a Summer of Code project to add ByteString support to Parsec. When that comes out, I may benchmark my code and compare it against a C implementation.