Not very long ago, I tried to use the GHC API to do some static analysis. It turned out to be a lot more complicated than I expected. Now I was going to fix some of that.
To write our non-trivial term constraint oracle tool, we had to write some static analysis code. We figured we would try to use the GHC API to handle parsing and type-checking, so we looked at the GHC API documentation from back then.
Down the rabbit hole
There isn't much for documentation on that page already, but an educated guess brought us to the GHC module. We searched for 'parse' and found
parseModule. So far so good, combining the information from a tutorial from version 7.6 and what we saw in the type signatures, we managed to call this function and get a
Now we could start writing our static analysis piece, right? How complicated can the
ParsedModule be? More complicated than we thought, as it turned out. From a
ParsedModule, we got a
ParsedSource, which is just a type synonym for a
Located (HsModule RdrName). Hmm, let's see:
Located is a type synonym for
GenLocated SrcSpan and
GenLocated has the first piece of documentation: "We attach
SrcSpans to lots of things, so let's have a datatype for it."
SrcSpan also had some documentation, so we were able to figure out that it means "Rectangular portion of a source file".
Great, we know what
SrcSpan are, let's go down the other half of the rabbit hole. What is
HsModule? The documentation says "All we actually declare here is the top-level structure for a module.", but it does not explain why
name is a parameter. Never mind, we were looking for the declarations in a module and specifically the left-hand side of functions. The
[LHsDecl name] part of the
HsModule name seemed promising, but what is a
LHsDecl? "Left-Hand side Declaration"? That's exactly what we need! Oh it's just an alias for
Located HsDecl and has nothing to do with the left-hand side of anything. That's anticlimactic.
This story goes on for quite a bit longer. We spend a few hours trying to figure out how we could use the GHC API but we felt no more confident after that than when we started, so we decided to use haskell-src-exts instead.
The reason why the GHC API was to confusing was twofold:
Most of the GHC API is undocumented.
Most of the names in the GHC API are abbreviations.
I had been trying to find something to do for my first GHC contribution, so I decided to contribute documentation. In particular, I was going to add the expansion of every abbreviation I could find in the names to their documentation, so that the next time someone wants to write some static analysis tool, they will not be discouraged by the API documentation.
The "how to contribute" guides for GHC are very good. The newcomers page, together with the 'How to contribute a patch to GHC' page got me up and running quickly. One task, one differential revision, 13 commits and a large number of comments from the very helpful GHC crew later, the GHC API had its confusing abbreviations expanded in its documentation.
Hopefully the GHC API will be less confusing to newcomers in the future.