# Announcing mergeful, part 3: Cooperative agreement on a collection of values.

Date 2019-12-28

This post announces the new mergeful library. In this third part, we describe how mergeful can help a server and its clients agree on a collection of values. This synchronisation mechanism will be the basis of a sync server for Smos.

### Why

The original use-case for this library came from a sync server for Smos. In particular, such a sync server would need to be able to synchronise (smos) files and somehow resolve conflicts. I wanted to compartmentalise the complexity as much as possible, and keep the library generic in the values that would be synced. The smos sync server would then use (smos) files as the values in the collection.

### How

#### Pieces of the puzzle

The workflow for syncing an item is very similar to the workflow for syncing a value. The only difference is that there is no separate initial request necessary. For reference, here is the entire flow again:

A central server and one or more clients want to cooperatively agree on zero or one value of type a. Clients store a ClientStore a value and the server stores a ServerStore a value. Clients regularly produce an SyncRequest a value from their ClientStore using a makeSyncRequest :: ClientStore a -> SyncRequest a function.

When the server receives an SyncRequest a value, it uses its ServerStore a and an processServerSync :: ServerStore a -> SyncRequest a -> (SyncResponse a, ServerStore a) function to produce an SyncResponse a and a new ServerStore a It then stores the new ServerStore a and sends the SyncResponse a back to the client.

When the client receives the SyncResponse a, it uses a mergeSyncResponse :: ClientStore a -> SyncResponse a -> ClientStore a function to update its ClientStore a to reflect the new synchronised state.

The following diagram should help:

More about the particulars of these types and functions later.

#### Rejected ideas

Agreeing on a collection sounds like a special case of agreeing on a value where the value is a collection. It is important to realise that in the typical case of syncing a collection, there will be few conflicts on the same value. However, if we regard the collection as a single value, even modifications that aren't strictly conflicting would be recorded as a conflict. Such a conflict would require the server to send over the entire collection again. Instead, we will add special support for synchronising collections, because it is a common case.

#### The mergeful solution

We will implement the syncing of collections on top of the syncing of items. In order to do that, without constantly creating duplicates, we must somehow give each item a name. Indeed, in order to compare a ClientItem with a ServerItem, we need to know which two in the collection to compare. The sollution that we've chosen is to use unique identifiers generated by the server. It is the server's responsibility to ensure that these identifiers are and remain unique. The server could, for example, use incremented identifiers (like in SQL) or randomly generated UUIDs.

Next, to define the ClientStore type, we will turn the ClientItem sum type into a product type:

data ClientStore i a =
ClientStore
{ clientStoreAddedItems :: Map ClientId a
-- ^ These items are new locally but have not been synced to the server yet.
, clientStoreSyncedItems :: Map i (Timed a)
-- ^ These items have been synced at their respective 'ServerTime's.
, clientStoreSyncedButChangedItems :: Map i (Timed a)
-- ^ These items have been synced at their respective 'ServerTime's
-- but modified locally since then.
, clientStoreDeletedItems :: Map i ServerTime
-- ^ These items have been deleted locally after they were synced
-- but the server has not been notified of that yet.
}


Note that the added items do not have identifiers yet, because the server has not generated any for them. Instead the client needs to maintain a client-side identifier for it. This is just a simple newtype around a Word64. The rest of the items each already have a server identifer, and the validity constraint will ensure that there are no duplicates:

instance (Validity i, Ord i, Validity a) => Validity (ClientStore i a) where
validate cs@ClientStore {..} =
mconcat
[ genericValidate cs
, declare "There are no duplicate IDs" $distinct$
concat $[ M.keys clientStoreSyncedItems , M.keys clientStoreSyncedButChangedItems , M.keys clientStoreDeletedItems ] ]  The 'SyncRequest' type looks similar again, again without the actual value for non-modified items: data SyncRequest i a = SyncRequest { syncRequestNewItems :: Map ClientId a -- ^ These items are new locally but have not been synced to the server yet. , syncRequestKnownItems :: Map i ServerTime -- ^ These items have been synced at their respective 'ServerTime's. , syncRequestKnownButChangedItems :: Map i (Timed a) -- ^ These items have been synced at their respective 'ServerTime's -- but modified locally since then. , syncRequestDeletedItems :: Map i ServerTime -- ^ These items have been deleted locally after they were synced -- but the server has not been notified of that yet. }  The ServerStore can now actually look very simple. Just a collection of items, each with their unique identifier: newtype ServerStore i a = ServerStore { serverStoreItems :: Map i (Timed a) }  The SyncResponse is actually very interesting, but it has been made easy by the work on syncing items. Indeed, we just need a field for every constructor in the ItemSyncResponse type: data SyncResponse i a = SyncResponse { syncResponseClientAdded :: Map ClientId (i, ServerTime) -- ^ The client added these items and server has succesfully been made aware of that. -- -- The client needs to update their server times , syncResponseClientChanged :: Map i ServerTime -- ^ The client changed these items and server has succesfully been made aware of that. -- -- The client needs to update their server times , syncResponseClientDeleted :: Set i -- ^ The client deleted these items and server has succesfully been made aware of that. -- -- The client can delete them from its deleted items , syncResponseServerAdded :: Map i (Timed a) -- ^ These items have been added on the server side -- -- The client should add them too. , syncResponseServerChanged :: Map i (Timed a) -- ^ These items have been modified on the server side. -- -- The client should modify them too. , syncResponseServerDeleted :: Set i -- ^ These items were deleted on the server side -- -- The client should delete them too , syncResponseConflicts :: Map i (Timed a) -- ^ These are conflicts where the server and the client both have an item, but it is different. -- -- The server kept its part of each, the client can either take whatever the server gave them -- or deal with the conflicts somehow, and then try to re-sync. , syncResponseConflictsClientDeleted :: Map i (Timed a) -- ^ These are conflicts where the server has an item but the client does not. -- -- The server kept its item, the client can either take whatever the server gave them -- or deal with the conflicts somehow, and then try to re-sync. , syncResponseConflictsServerDeleted :: Set i -- ^ These are conflicts where the server has no item but the client has a modified item. -- -- The server left its item deleted, the client can either delete its items too -- or deal with the conflicts somehow, and then try to re-sync. }  #### Actual synchronization The complex work of the actual synchronisation has already been done in the item synchronisation. The complexity of collection synchronisation itself comes from matching up client items and server items up in pains on the server side, and from merging the SyncResponse back into the ClientStore afterwards. Creating the SyncRequest is easy. We just need to not send over the values that have not been modified: makeSyncRequest :: ClientStore i a -> SyncRequest i a makeSyncRequest ClientStore {..} = SyncRequest { syncRequestNewItems = clientStoreAddedItems , syncRequestKnownItems = M.map timedTime clientStoreSyncedItems , syncRequestKnownButChangedItems = clientStoreSyncedButChangedItems , syncRequestDeletedItems = clientStoreDeletedItems }  Processing a 'SyncRequest' on the server-side is tricky. This is a sketch of the solution with some helper functions are left out. These helper functions are left as an exercise to the reader, and the solutions can be found here. processServerSync :: forall i a m. (Ord i, Monad m) => m i -- ^ The action that is guaranteed to generate unique identifiers -> ServerStore i a -> SyncRequest i a -> m (SyncResponse i a, ServerStore i a) processServerSync genId ServerStore {..} sr@SyncRequest {..} -- Make tuples of requests for all of the items that only had a client identifier. = do let unidentifedPairs :: Map ClientId (ServerItem a, ItemSyncRequest a) unidentifedPairs = M.map (\a -> (ServerEmpty, ItemSyncRequestNew a)) syncRequestNewItems -- Make tuples of results for each of the unidentifier tuples. unidentifedResults :: Map ClientId (ItemSyncResponse a, ServerItem a) unidentifedResults = M.map (uncurry processServerItemSync) unidentifedPairs generatedResults <- generateIdentifiersFor genId unidentifedResults -- Gather the items that had a server identifier already. let clientIdentifiedSyncRequests :: Map i (ItemSyncRequest a) clientIdentifiedSyncRequests = identifiedItemSyncRequests sr -- Make 'ServerItem's for each of the items on the server side serverIdentifiedItems :: Map i (ServerItem a) serverIdentifiedItems = M.map ServerFull serverStoreItems -- Match up client items with server items by their id. thesePairs :: Map i (These (ServerItem a) (ItemSyncRequest a)) thesePairs = unionTheseMaps serverIdentifiedItems clientIdentifiedSyncRequests -- Make tuples of server 'ServerItem's and 'ItemSyncRequest's for each of the items with an id requestPairs :: Map i (ServerItem a, ItemSyncRequest a) requestPairs = M.map (fromThese ServerEmpty ItemSyncRequestPoll) thesePairs -- Make tuples of results for each of the tuplus that had a server identifier. identifiedResults :: Map i (ItemSyncResponse a, ServerItem a) identifiedResults = M.map (uncurry processServerItemSync) requestPairs -- Put together the results together let allResults :: Map (Identifier i) (ItemSyncResponse a, ServerItem a) allResults = M.union (M.mapKeys OnlyServer identifiedResults) (M.mapKeys (uncurry BothServerAndClient) generatedResults) pure$ produceSyncResults allResults


Merging the SyncResponse back into the ClientStore is similarly tricky, and is left as an exercise to the reader again. The solution can be found here.

#### Testing the implementation

There are a few interesting aspects to the tests. In particular, apart from functionality regarding conlficts, all client-side merge functions should act in the same way. This means that we can make a test suite combinator for them:

mergeFunctionSpec ::
forall a. (Show a, Ord a, GenValid a)
=> (forall i. Ord i =>
ClientStore i a -> SyncResponse i a -> ClientStore i a)
-> Spec


Here we are using {-# LANGUAGE RankNTypes #-} to describe that the caller can decide on the parameter a (the values to synchronise), but the callee can decide on the parameter i (the identifiers for the values). The callee (the test suite combinator), wants to both use pseudorandomly generated UUIDs and also specify that the type of the identifiers should not matter to the merge function.

To generate the UUIDs, the test suite uses a little state transformer monad:

newtype D m a =
D
{ unD :: StateT StdGen m a
}

evalD :: D Identity a -> a
evalD = runIdentity . evalDM

runD :: D Identity a -> StdGen -> (a, StdGen)
runD = runState . unD

evalDM :: Functor m => D m a -> m a
evalDM d = fst <$> runDM d (mkStdGen 42) runDM :: D m a -> StdGen -> m (a, StdGen) runDM = runStateT . unD genD :: Monad m => D m UUID genD = do r <- get let (u, r') = random r put r' pure u  Note that, for the sake of test reproducibility, it is important that the seed of the random generator is in fact fixed. Here we chose 42. The mergeFunctionSpec test suite combinator is called with three different merge functions: • mergeSyncResponseFromServer • mergeSyncResponseIgnoreProblems • A custom merge function that uses the GCounter CRDT to resolve merge conflicts:  describe "Syncing with mergeSyncResponseUsingStrategy with a GCounter"$
mergeFunctionSpec @Int \$
mergeSyncResponseUsingStrategy
ItemMergeStrategy
{ itemMergeStrategyMergeChangeConflict =
\clientItem (Timed serverItem t) -> Timed (max clientItem serverItem) t
, itemMergeStrategyMergeClientDeletedConflict = \serverItem -> Just serverItem
, itemMergeStrategyMergeServerDeletedConflict = \_ -> Nothing
}


We will not discuss every test here, but you can be sure that it is quite comprehensive. Here is a small overview

Data.Mergeful.Collection
initialServerStore
is valid
initialSyncRequest
is valid
emptySyncResponse
is valid
makeSyncRequest
produces valid requests
produces valid results
mergeSyncedButChangedItems
produces valid results
produces valid results
mergeSyncedButChangedItems
produces valid results
mergeDeletedItems
produces valid results
mergeSyncResponseIgnoreProblems
produces valid requests
processServerSync
produces valid tuples of a response and a store
Syncing with mergeSyncResponseIgnoreProblems
Single client
Multi-item
succesfully uploads everything to the server for an empty server
is idempotent with one client
Multiple clients
Single-item
successfully syncs an addition accross to a second client
successfully syncs a modification accross to a second client
succesfully syncs a deletion across to a second client
does not run into a conflict if two clients both try to sync a deletion
Multiple items
successfully syncs additions accross to a second client
succesfully syncs deletions across to a second client
does not run into a conflict if two clients both try to sync a deletion
does not lose data after a conflict occurs
Syncing with mergeSyncResponseFromServer
Single client
Multi-item
succesfully uploads everything to the server for an empty server
is idempotent with one client
Multiple clients
Single-item
successfully syncs an addition accross to a second client
successfully syncs a modification accross to a second client
succesfully syncs a deletion across to a second client
does not run into a conflict if two clients both try to sync a deletion
Multiple items
successfully syncs additions accross to a second client
succesfully syncs deletions across to a second client
does not run into a conflict if two clients both try to sync a deletion
mergeSyncResponseFromServer
only differs from mergeSyncResponseIgnoreProblems on conflicts
does not diverge after a conflict occurs
Syncing with mergeSyncResponseUsingStrategy with a GCounter
Single client
Multi-item
succesfully uploads everything to the server for an empty server
is idempotent with one client
Multiple clients
Single-item
successfully syncs an addition accross to a second client
successfully syncs a modification accross to a second client
succesfully syncs a deletion across to a second client
does not run into a conflict if two clients both try to sync a deletion
Multiple items
successfully syncs additions accross to a second client
succesfully syncs deletions across to a second client
does not run into a conflict if two clients both try to sync a deletion


### References

The mergeful library is available on Hackage. Mergeful originated in the work on Smos, Intray and Tickler. This post is part of an effort to encourage contributions to Smos. The simplest contribution could be to just try out smos and provide feedback on the experience. Smos is a purely functional semantic forest editor of a subset of YAML that is intended to replace Emacs' Org-mode for Getting Things Done.

2019; year in review