Good Haskell Style

Introduction

There is a common misconception that the purpose of writing Haskell programs is to instruct a machine to perform certain computations, and to report the results thereof. This myth is perpetrated by numerous lecture courses, books, websites and other media, which merely teach how to write programs with the desired functionality.

In actual fact, to write a good Haskell program is to create a work of art. As such, any Haskell program worth writing must be beautiful. In fact, rumour has it that the GHC optimiser analyses programs it is asked to compile and, if it finds it aesthetically pleasing, will work harder to make your program run faster.

But what determines whether a Haskell program is beautiful or not? Read on, my intrepid friend, and you shall find out!

Notation

Not to give too many clues away as to what is coming up, but let's start off with a little notation. Haskell code will be written thus:

f = map g
g = id

As spaces, tabs are invisible, where we need to talk about them we will shade them thus:

space = ' '
tab   = '       '

Unless otherwise stated, tabs will continue to the next column that is a multiple of 8, as required by the Haskell report.

Trailing white space

Let's warm up with a nice easy one: trailing white space.

Quite simple really: trailing white space is ugly. Just to make it concrete, this is bad:

f x = 
    let y = x   
    in y

and this is good:

f x =
    let y = x
    in y

Our favourite revision crontrol system can help keep us on the straight an narrow here; if you try to record the bad example above then it will prompt you with

hunk ./foo.hs 1
+f x = $
+    let y = x^I$
+    in y

clearly highlighting the trailing white space.

Tabs

Now we're nicely warmed up it's time to turn the heat up some more, with one of the more contentious issues: tabs.

Haskell defines tabs to align to the next column that is divisible by 8 characters - informally refered to as "8 space tabs". It is this definition that is used by the layout rule.

However, many people configure their editors with 2 or 4 space tabs. This means that when editing nested-block, such as C:

int main(void) {
        if (1) {
                while (1) {
                        ...

the code is less likely to be wider than the editor.

Before we begin, it is worth taking a look at what happens when no policy is agreed upon. Here are two excerpts from some real code; the first is from the GHC sources, while the second is from the base package:

  let dflags1 = dflags0{ ghcMode = mode,
                         hscTarget  = lang,
                         -- leave out hscOutName for now
                         hscOutName = panic "Main.main:hscOutName not set",
                         verbosity = case cli_mode of
                                         DoEval _ -> 0
                                         _other   -> 1
                        }
  if eol
        then do if (w == off + 1)
                        then writeIORef ref buf{ bufRPtr=0, bufWPtr=0 }
                        else writeIORef ref buf{ bufRPtr = off + 1 }
                return (concat (reverse (xs:xss)))
        else do

I think we can agree that these examples are ugly, and there are many more like them to be found. As we will see, this undisciplined style of indentation causes other problems as well.

Some people have claimed that a benefit of tabs is as a primitive form of compression. While this is undeniably true, we believe that this is a minor benefit given the size of source code and today's abundance of disk space.

But what disadvantage is there of treating tabs merely as white space compression? Consider this function definition:

f x = case x of
      Just y ->
          case y of
          Just z ->
              z

If people read the code with tabs configured as Haskell dictates, then of course they will see the correct definition you wrote. However, suppose the reader has his editor configured to show 2 space tabs; then they will see:

f x = case x of
      Just y ->
    case y of
    Just z ->
        z

which no longer looks like a syntactically valid program! Let us say that this program is tab-significant, i.e. the width of a tab can change the meaning of the program, while the earlier C program example was tab-insignificant. Then it is possible to use tabs in Haskell code and stil write tab-insignificant Haskell programs, it is just either non-trivial or constrains your code layout. Let us now look at each in turn.

First the non-trivial way. Consider this program:

f x = case x of
        Left y -> let z = Right y
                  in case y of
                        True -> print z

Here we use tabs to indent the blocks of code which the layout rule will be creating. Let us look at the last line of the program. We must first use 6 spaces to indent to the level of the outer case expression, and then a tab to enter the block started by the case. A further 13 spaces take us to the level of the inner case, and a second tab finally gets us into the block that this case starts.

However, while this is correct, it is not easy to do by hand, nor is it easy to instruct an editor to do it for you automatically. Thus we don't wish to require people stick to this style.

Now let us consider the constraining style. Here, each time we start a new block we also start a new line, and indent with one more tab than previously. Here is the same example rewritten in this style:

f x = case x of
        Left y -> let
                z = Right y
         in case y of
                True -> print z

As you can see, there are also a couple of tweaks needed to make this style work correctly, such as indenting "in" by a space. It can be made to work, though, and is simple. So what is wrong with it? Unfortunately, it constrains you to a code layout that we believe is ugly.

So, we have eliminated both options; what is left? Simple: Do not use tabs. At all. Now all programs look identical everywhere, and everyone is happy, regardless of their editor settings! From simplicity comes beauty.

Line length

Resolving the tabs question in favour of "no tabs" means that the question of how long a line should be becomes a valid one to ask.

For better or for worse, the nearest thing we have to a standard editor width is 80 columns. Therefore we should strive to keep lines to at most 79 columns. Sometimes the extra ugliness necessary to contort the code to this restriction will be worse than just letting a line or two wrap, but it's a good target.

CamelCase versus underscores

Another bone of contention between Haskellers is whether to go for CamelCase, e.g. myVariableName, or underscores, e.g. my_variable_name.

We advise the use of CamelCase for two reasons. First, the report and standard libraries use it, so it helps keep Haskell code consistent. Second, we have just decided on a maximum line length, and the extra character or two saved in an identifier or two can make the difference between a line that simply does not fit nicely in 79 columns, and one that makes it look effortless.

Layout

While the Haskell syntax is defined with braces and semicolons, similar to lesser, more traditional, languages, we are also able to omit the braces and semicolons and have them infered by the parser. But which is the better style?

Some people argue that the use of explicit braces and semicolons makes code easier to understand, particularly for those new to the language. For example, in

f x = case x of {
          Just y ->
              case y of {
                  Just z ->
                      z;
                  Nothing ->
                      1
                  }
              };
          Nothing -> 0
      }

the structure of the program is clearly defined by the braces and semicolons. Unfortunately, after a few bug fixes, it is all too easy to end up with code that looks more like this:

f x = case x of {
          Just y ->
              case y of {
                  Just z ->
                                    z;
      Nothing ->
          1
                  } };
                  Nothing -> 0
      }

By following the braces and semicolons it is still possible to see the program structure, but it is now obscured by poor layout. By contrast, if we write the program with implicit layout, thus:

f x = case x of
          Just y ->
              case y of
                  Just z ->
                      z
                  Nothing ->
                      1
          Nothing -> 0

then our Haskell implementations will make sure that we keep it vaguely readable, by giving a parse error if we break any of its rules. Note also that omiting the punctuation gives smaller, cleaner-looking code. Thus we recommend that you use implicit layout.

Conclusion

And that concludes our journey. To summarise, then:

In order to help you on your quest to write beautiful programs, we have some tips on how you can get vim to assist you.

I look forward to admiring your artistic creations in the future!