Announcing mergeful, part 2: Cooperative agreement on zero or one values

This post announces the new mergeful library. In this second part, we describe how mergeful can help a server and its clients agree on zero or one value with safe merge conflicts.

Why

Extending the mergeful approach of agreeing on a single value to agreeing on a collection of values is not straightforward at all. Instead of making that jump immediately, we take the smaller step to agreeing on zero or one value. The reason that this step is so important is because a deleted item looks like the 'zero' case. More about this in the next blogpost.

How

Pieces of the puzzle

The workflow for syncing an item is very similar to the workflow for syncing a value. The only difference is that there is no separate initial request necessary. For reference, here is the entire flow again:

A central server and one or more clients want to cooperatively agree on zero or one value of type a. Clients store a ClientItem a value and the server stores a ServerItem a value. Clients regularly produce an ItemSyncRequest a value from their ClientStore using a makeItemSyncRequest :: ClientItem a -> ItemSyncRequest a function.

When the server receives an ItemSyncRequest a value, it uses its ServerItem a and an processServerItemSync :: ServerItem a -> ItemSyncRequest a -> (ItemSyncResponse a, ServerItem a) function to produce an ItemSyncResponse a and a new ServerItem a It then stores the new ServerItem a and sends the ItemSyncResponse a back to the client.

When the client receives the ItemSyncResponse a, it uses a mergeItemSyncResponse :: ClientItem a -> ItemSyncResponse a -> ClientItem a function to update its ClientItem a to reflect the new synchronised state.

The following diagram should help:

More about the particulars of these types and functions later.

Rejected ideas

One could be tempted to implement the 'item' case in terms of the 'value' case. You could argue that a ClientItem a should just be the same as a ClientValue (Maybe a) and that does make sense. However, there are two unfortunate consequences of such an approach.

The first is that the type system over-estimates the possible cases. For example, MergeConflict Nothing Nothing would be a valid value according to the types, while we already statically know that it should not be possible.

The second problem is that, in this approach, there is no difference between an added item and a modified item. Indeed, by a Just v value alone, you cannot tell whether this value changed from Just u to Just v or from Nothing to Just v. We would like to be able to make this distinction.

The mergeful solution

We will keep the overall approach of value synchronisation, and the ServerTime and Timed types. They were useful.

The first change we need to make is that the ClientItem type gets more constructors:

data ClientItem a
  -- | There is no item on the client side
  = ClientEmpty
  -- | There is is an item but the server is not aware of it yet.
  | ClientAdded !a
  -- | There is is an item and it has been synced with the server.
  | ClientItemSynced !(Timed a)
  -- | There is is an item and it has been synced with the server, but it has since been modified.
  | ClientItemSyncedButChanged !(Timed a)
  -- | There was an item, and it has been deleted locally, but the server has not been made aware of this.
  | ClientDeleted !ServerTime

Note that, because of the ClientAdded constructor, we can see the difference between an added and a modified item.

The ItemSyncRequest type is expanded in the same way. Again, the only difference is that there is no need to send over a synced value if it has not been modified.

data ItemSyncRequest a
  -- | There is no item locally
  = ItemSyncRequestPoll
  -- | There is an item locally that hasn't been synced to the server yet.
  | ItemSyncRequestNew !a
  -- | There is an item locally that was synced at the given 'ServerTime'
  | ItemSyncRequestKnown !ServerTime
  -- | There is an item locally that was synced at the given 'ServerTime'
  -- but it has been changed since then.
  | ItemSyncRequestKnownButChanged !(Timed a)
  -- | There was an item locally that has been deleted but the
  -- deletion wasn't synced to the server yet.
  | ItemSyncRequestDeletedLocally !ServerTime

The ServerItem type also needs to be a tad bit bigger:

data ServerItem a
  = ServerEmpty
  | ServerFull !(Timed a)

Note that ServerItem a is very similar to ServerValue (Maybe a).

So far so good, but now comes the complex part. There are 10 or 11 possible situations when it comes to an ItemSyncResponse. The server and the client could be in sync, and you can split this situation up into whether they were in sync on an empty value or on a full value. Those are one or two scenarios, depending on whether you split them up Both the server and the client separately could have caused one of these three transitions: an addition, a change or a deletion. Those are six more scenarios. Lastly, there are three possible conflicts. If the client and the server run into a conflict, it could be that they have made a conflicting modification, or because one of them deleted the item while the other changed it.

data ItemSyncResponse a
  -- | The client and server are fully in sync, and both empty
  --
  -- Nothing needs to be done at the client side.
  = ItemSyncResponseInSyncEmpty
  -- | The client and server are fully in sync.
  --
  -- Nothing needs to be done at the client side.
  | ItemSyncResponseInSyncFull
  -- | The client added an item and server has succesfully been made aware of that.
  --
  -- The client needs to update its server time
  | ItemSyncResponseClientAdded !ServerTime
  -- | The client changed an item and server has succesfully been made aware of that.
  --
  -- The client needs to update its server time
  | ItemSyncResponseClientChanged !ServerTime
  -- | The client deleted an item and server has succesfully been made aware of that.
  --
  -- Nothing needs to be done at the client side.
  | ItemSyncResponseClientDeleted
  -- | This item has been added on the server side
  --
  -- The client should add it too.
  | ItemSyncResponseServerAdded !(Timed a)
  -- | This item has been modified on the server side.
  --
  -- The client should modify it too.
  | ItemSyncResponseServerChanged !(Timed a)
  -- | The item was deleted on the server side
  --
  -- The client should delete it too.
  | ItemSyncResponseServerDeleted
  -- | A conflict occurred.
  --
  -- The server and the client both have an item, but it is different.
  -- The server kept its part, the client can either take whatever the server gave them
  -- or deal with the conflict somehow, and then try to re-sync.
  | ItemSyncResponseConflict !(Timed a) -- ^ The item at the server side
  -- | A conflict occurred.
  --
  -- The server has an item but the client does not.
  -- The server kept its part, the client can either take whatever the server gave them
  -- or deal with the conflict somehow, and then try to re-sync.
  | ItemSyncResponseConflictClientDeleted !(Timed a) -- ^ The item at the server side
  -- | A conflict occurred.
  --
  -- The client has a (modified) item but the server does not have any item.
  -- The server left its item deleted, the client can either delete its item too
  -- or deal with the conflict somehow, and then try to re-sync.
  | ItemSyncResponseConflictServerDeleted

Actual synchronization

It is important to realise that when I was first writing this library, I did not work through the problem in the same order as you are now reading through the solution. But once the types are firmly in place, the following synchronisation functions should be relatively straightforward to write.

The function to make a sync request is relatively simple.

makeItemSyncRequest :: ClientItem a -> ItemSyncRequest a

The function to process a sync request is inherently complex, but using literate programming, it is straightforward to work through.

processServerItemSync 
  :: ServerItem a
  -> ItemSyncRequest a
  -> (ItemSyncResponse a, ServerItem a)

The mergeItemSyncResponse function, similar to the mergeValueSyncResponse function, is not complex per-se, but again requires the programmer to either make a choice as to what will happen in specific situations or to make it a bit more complex and general.

These three functions are left as an exercise to the reader. The solutions can be found in the source code and are well-documented.

Testing the implementation

This module would never have been possible without thorough testing, and in this blogpost I would like to focus more on the testing than on the implementation.

Using validity-based testing, all of the tests were property tests. We let all the generators be derived automatically, see for example the generators for ClientItem:

instance GenUnchecked ServerTime
instance GenValid ServerTime

There is no omitted code here. This is the full code for the generators. The GenUnchecked instance has a default implementation using the Generic instance of ServerTime, and the GenValid instance has a default implementation using the GenUnchecked and Validity instances. The shrinking functions are also automatically generated in a similar way.

For the actual testing, the standard producesValidsOnValids property combinator came in very handy:

spec :: Spec
spec = do
  describe "makeValueSyncRequest" $
    it "produces valid requests" $ producesValidsOnValids (makeValueSyncRequest @Int)
  describe "processServerValueSync" $ do
    it "produces valid responses and stores" $ producesValidsOnValids2 (processServerValueSync @Int)

Again, no code is omitted here. This is the entire definition of the test.

Like for the value syncing, we used a nice idempotency property again:

  it "is idempotent with one client" $
    forAllValid $ \cstore1 ->
      forAllValid $ \sstore1 -> do
        let req1 = makeItemSyncRequest (cstore1 :: ClientItem Int)
            (resp1, sstore2) = processServerItemSync sstore1 req1
            cstore2 = mergeItemSyncResponseIgnoreProblems cstore1 resp1
            req2 = makeItemSyncRequest cstore2
            (resp2, sstore3) = processServerItemSync sstore2 req2
            cstore3 = mergeItemSyncResponseIgnoreProblems cstore2 resp2
        cstore2 `shouldBe` cstore3
        sstore2 `shouldBe` sstore3

Lastly, some custom tests came in handy for the specific cases for which we want to use the library. Here is one such example:

  describe "syncing" $ do
    it "succesfully syncs an addition across to a second client" $
      forAllValid $ \i -> do
        -- Client A has added an item 'i'.
        let cAstore1 = ClientAdded i
        -- Client B is empty.
        let cBstore1 = ClientEmpty
        -- The server is empty.
        let sstore1 = ServerEmpty
        -- Client A makes sync request 1.
        let req1 = makeItemSyncRequest cAstore1
        -- The server processes sync request 1.
        let (resp1, sstore2) = processServerItemSync @Int sstore1 req1
        let time = initialServerTime
        resp1 `shouldBe` ItemSyncResponseClientAdded time
        sstore2 `shouldBe` ServerFull (Timed i time)
        -- Client A merges the response.
        let cAstore2 = mergeItemSyncResponseIgnoreProblems cAstore1 resp1
        cAstore2 `shouldBe` ClientItemSynced (Timed i time)
        -- Client B makes sync request 2.
        let req2 = makeItemSyncRequest cBstore1
        -- The server processes sync request 2.
        let (resp2, sstore3) = processServerItemSync sstore2 req2
        resp2 `shouldBe` ItemSyncResponseServerAdded (Timed i time)
        sstore3 `shouldBe` ServerFull (Timed i time)
        -- Client B merges the response.
        let cBstore2 = mergeItemSyncResponseIgnoreProblems cBstore1 resp2
        cBstore2 `shouldBe` ClientItemSynced (Timed i time)
        -- Client A and Client B now have the same store.
        cAstore2 `shouldBe` cBstore2

References

The mergeful library is available on Hackage. Mergeful originated in the work on Smos, Intray and Tickler. This post is part of an effort to encourage contributions to Smos. The simplest contribution could be to just try out smos and provide feedback on the experience. Smos is a purely functional semantic forest editor of a subset of YAML that is intended to replace Emacs' Org-mode for Getting Things Done.