Estimated reading time in Hakyll

Date 2016-06-05

As of this week, every post on this blog is prefaced with an Estimated Reading Time (ERT). It's about giving readers an honest estimate of how much time they will spend reading the post.

ERT for ethical blogging

Implementing ERT in Hakyll

To add an ERT to every post, we'll implement a new Hakyll Compiler. I was already using a pandocCompiler and there is a pandocCompilerWithTransform that is going to do most of the work:

import           Hakyll
import           Text.Pandoc.Definition
import           Text.Pandoc.Options

myCompiler :: Compiler (Item String)
myCompiler
    = pandocCompilerWithTransform
        defaultHakyllReaderOptions
        defaultHakyllWriterOptions
        myTransform

The function pandocCompilerWithTransform expects a Pandoc transformation myTransform :: Pandoc -> Pandoc. and applies it to the result of the pandoc compilation. We'll implement this Pandoc transformation such that it adds the ERT at the top of the blogpost.

myTransform :: Pandoc -> Pandoc
myTransform p@(Pandoc meta blocks) = (Pandoc meta (ert:blocks))
  where ert = Para [ SmallCaps [Str "[ERT: ", Str $ timeEstimateString p ++ "]"] ]

Concretely, the ERT will be the estimated amount of time required for an average reader to read the post.

    timeEstimateString :: Pandoc -> String
    timeEstimateString = toClockString . timeEstimateSeconds

    toClockString :: Int -> String
    toClockString i
        | i >= 60 * 60 = show hours   ++ "h" ++ show minutes ++ "m" ++ show seconds ++ "s"
        | i >= 60      = show minutes ++ "m" ++ show seconds ++ "s"
        | otherwise    = show seconds ++ "s"
      where
        hours   = i `quot` (60 * 60)
        minutes = (i `rem` (60 * 60)) `quot` 60
        seconds = i `rem` 60

According to Google, the average reading speed is around $300$ words per minute, or $5$ words per second.

    timeEstimateSeconds :: Pandoc -> Int
    timeEstimateSeconds = (`quot` 5) . nrWords

Also according to Google, there are about $5$ letters in a English word on average.

    nrWords :: Pandoc -> Int
    nrWords = (`quot` 5) . nrLetters

Finally there's the matter of actually counting the number of letters in the post. The code is is lentghy and boring, but here is anyway, in case you would like to reuse it.

    nrLetters :: Pandoc -> Int
    nrLetters (Pandoc _ bs) = sum $ map cb bs
      where
        cbs = sum . map cb
        cbss = sum . map cbs
        cbsss = sum . map cbss

        cb :: Block -> Int
        cb (Plain is) = cis is
        cb (Para is) = cis is
        cb (CodeBlock _ s) = length s
        cb (RawBlock _ s) = length s
        cb (BlockQuote bs) = cbs bs
        cb (OrderedList _ bss) = cbss bss
        cb (BulletList bss) = cbss bss
        cb (DefinitionList ls) = sum $ map (\(is, bss) -> cis is + cbss bss) ls
        cb (Header _ _ is) = cis is
        cb HorizontalRule = 0
        cb (Table is _ _ tc tcs) = cis is + cbss tc + cbsss tcs
        cb (Div _ bs) = cbs bs
        cb Null = 0

        cis = sum . map ci
        ciss = sum . map cis

        ci :: Inline -> Int
        ci (Str s) = length s
        ci (Emph is) = cis is
        ci (Strong is) = cis is
        ci (Strikeout is) = cis is
        ci (Superscript is) = cis is
        ci (Subscript is) = cis is
        ci (SmallCaps is) = cis is
        ci (Quoted _ is) = cis is
        ci (Cite _ is) = cis is
        ci (Code _ s) = length s
        ci Space = 1
        ci SoftBreak = 1
        ci LineBreak = 1
        ci (Math _ s) = length s
        ci (RawInline _ s) = length s
        ci (Link _ is (_, s)) = cis is + length s
        ci (Image _ is (_, s)) = cis is + length s
        ci (Note bs) = cbs bs
Previous
A non-trivial term-constraint oracle, and a new Haskell teaching tool

Start your Haskell project from a template

Haskell templates
Next
Reducing the number of slot machines