Haskell: Refactoring in Miniature

As a general rule, I find I’m not happy with a software project until at least the third ground-up rewrite. Rewriting is usually discouraged in business, and not without some good reason. But in my spare time, I often find myself happily refactoring my code, each design better than the last, but never quite getting any further before I spot some flaw that inspires another redesign.

Haskell seems to encourage this constant honing, not just for complex modules or sophisticated libraries, but even for the smallest function. It’s refactoring in miniature; a sort of code bonsai.

I was recently trying to create a netstrings implementation for Haskell, and I started off with a function that would strip the first length indicator off. So for instance, “5:hello,” would become (5, “hello,”)

readLength :: String -> (Int, String)
readLength s = (read n, tail s')
                where (n, s') = break (== ':') s

This was my first attempt. Did it work? Yep! Could I do better? Maybe…

I recently came across a blog post that demonstrated how useful arrows could be, and it seemed like they could be used here as well. In the above case, I’m applying read and tail to the first and second arguments of a tuple, but the arrows library provides a function that does that already:

(***) :: (b -> c) -> (b' -> c') -> (b, b') -> (c, c')

Refactoring my code to use the Control.Arrow resulted in a rather lovely piece of code:

import Control.Arrow
 
readLength :: String -> (Int, String)
readLength = break (== ':') >>> read *** tail

So concise! So succinct! So devoid of error checking… As beautiful as that one line of code was, it was just going to get messier when I started putting in conditions to ensure that the data was formatted correctly, and adding all of the other functions needed to read a stream of netstrings. What I needed was something that could interpret a piece of formatted data… something like a parser

I honestly don’t know why it took me so long to think of using Parsec, but once I did the task suddenly became trivial, and I bashed out the entire netstring parser in almost no time at all:

import Text.ParserCombinators.Parsec
 
number :: Parser Int
number = do d <- many1 digit
            return $ read d
 
netString :: Parser String
netString = do n <- number
               char ':'
               s <- count n anyChar
               char ','
               return s
 
netStrings :: Parser [String]
netStrings = do ns <- many netString
                eof
                return ns

It doesn’t use arrows, but it’s clear, concise, and will throw a useful parse error when it comes across a problem. The only downside with using Parsec is that it doesn’t have ByteString support, so the efficiency of this implementation is likely to suffer under heavy traffic. Luckily, however, there’s a Summer of Code project to add ByteString support to Parsec. When that comes out, I may benchmark my code and compare it against a C implementation.