This post announces the new mergeful
library. In this third part, we describe how mergeful can help a server and its clients agree on a collection of values. This synchronisation mechanism will be the basis of a sync server for Smos.
Why
The original use-case for this library came from a sync server for Smos. In particular, such a sync server would need to be able to synchronise (smos) files and somehow resolve conflicts. I wanted to compartmentalise the complexity as much as possible, and keep the library generic in the values that would be synced. The smos sync server would then use (smos) files as the values in the collection.
How
Pieces of the puzzle
The workflow for syncing an item is very similar to the workflow for syncing a value. The only difference is that there is no separate initial request necessary. For reference, here is the entire flow again:
A central server and one or more clients want to cooperatively agree on zero or one value of type a
. Clients store a ClientStore a
value and the server stores a ServerStore a
value. Clients regularly produce an SyncRequest a
value from their ClientStore
using a makeSyncRequest :: ClientStore a -> SyncRequest a
function.
When the server receives an SyncRequest a
value, it uses its ServerStore a
and an processServerSync :: ServerStore a -> SyncRequest a -> (SyncResponse a, ServerStore a)
function to produce an SyncResponse a
and a new ServerStore a
It then stores the new ServerStore a
and sends the SyncResponse a
back to the client.
When the client receives the SyncResponse a
, it uses a mergeSyncResponse :: ClientStore a -> SyncResponse a -> ClientStore a
function to update its ClientStore a
to reflect the new synchronised state.
The following diagram should help:
More about the particulars of these types and functions later.
Rejected ideas
Agreeing on a collection sounds like a special case of agreeing on a value where the value is a collection. It is important to realise that in the typical case of syncing a collection, there will be few conflicts on the same value. However, if we regard the collection as a single value, even modifications that aren't strictly conflicting would be recorded as a conflict. Such a conflict would require the server to send over the entire collection again. Instead, we will add special support for synchronising collections, because it is a common case.
The mergeful solution
We will implement the syncing of collections on top of the syncing of items. In order to do that, without constantly creating duplicates, we must somehow give each item a name. Indeed, in order to compare a ClientItem
with a ServerItem
, we need to know which two in the collection to compare. The sollution that we've chosen is to use unique identifiers generated by the server. It is the server's responsibility to ensure that these identifiers are and remain unique. The server could, for example, use incremented identifiers (like in SQL) or randomly generated UUIDs.
Next, to define the ClientStore
type, we will turn the ClientItem
sum type into a product type:
data ClientStore i a =
ClientStore
clientStoreAddedItems :: Map ClientId a
{-- ^ These items are new locally but have not been synced to the server yet.
clientStoreSyncedItems :: Map i (Timed a)
,-- ^ These items have been synced at their respective 'ServerTime's.
clientStoreSyncedButChangedItems :: Map i (Timed a)
,-- ^ These items have been synced at their respective 'ServerTime's
-- but modified locally since then.
clientStoreDeletedItems :: Map i ServerTime
,-- ^ These items have been deleted locally after they were synced
-- but the server has not been notified of that yet.
}
Note that the added items do not have identifiers yet, because the server has not generated any for them. Instead the client needs to maintain a client-side identifier for it. This is just a simple newtype around a Word64
. The rest of the items each already have a server identifer, and the validity constraint will ensure that there are no duplicates:
instance (Validity i, Ord i, Validity a) => Validity (ClientStore i a) where
@ClientStore {..} =
validate csmconcat
[ genericValidate cs"There are no duplicate IDs" $
, declare $
distinct concat $
[ M.keys clientStoreSyncedItems
, M.keys clientStoreSyncedButChangedItems
, M.keys clientStoreDeletedItems
] ]
The 'SyncRequest' type looks similar again, again without the actual value for non-modified items:
data SyncRequest i a =
SyncRequest
syncRequestNewItems :: Map ClientId a
{-- ^ These items are new locally but have not been synced to the server yet.
syncRequestKnownItems :: Map i ServerTime
,-- ^ These items have been synced at their respective 'ServerTime's.
syncRequestKnownButChangedItems :: Map i (Timed a)
,-- ^ These items have been synced at their respective 'ServerTime's
-- but modified locally since then.
syncRequestDeletedItems :: Map i ServerTime
,-- ^ These items have been deleted locally after they were synced
-- but the server has not been notified of that yet.
}
The ServerStore
can now actually look very simple. Just a collection of items, each with their unique identifier:
newtype ServerStore i a = ServerStore { serverStoreItems :: Map i (Timed a) }
The SyncResponse
is actually very interesting, but it has been made easy by the work on syncing items. Indeed, we just need a field for every constructor in the ItemSyncResponse
type:
data SyncResponse i a =
SyncResponse
syncResponseClientAdded :: Map ClientId (i, ServerTime)
{-- ^ The client added these items and server has succesfully been made aware of that.
--
-- The client needs to update their server times
syncResponseClientChanged :: Map i ServerTime
,-- ^ The client changed these items and server has succesfully been made aware of that.
--
-- The client needs to update their server times
syncResponseClientDeleted :: Set i
,-- ^ The client deleted these items and server has succesfully been made aware of that.
--
-- The client can delete them from its deleted items
syncResponseServerAdded :: Map i (Timed a)
,-- ^ These items have been added on the server side
--
-- The client should add them too.
syncResponseServerChanged :: Map i (Timed a)
,-- ^ These items have been modified on the server side.
--
-- The client should modify them too.
syncResponseServerDeleted :: Set i
,-- ^ These items were deleted on the server side
--
-- The client should delete them too
syncResponseConflicts :: Map i (Timed a)
,-- ^ These are conflicts where the server and the client both have an item, but it is different.
--
-- The server kept its part of each, the client can either take whatever the server gave them
-- or deal with the conflicts somehow, and then try to re-sync.
syncResponseConflictsClientDeleted :: Map i (Timed a)
,-- ^ These are conflicts where the server has an item but the client does not.
--
-- The server kept its item, the client can either take whatever the server gave them
-- or deal with the conflicts somehow, and then try to re-sync.
syncResponseConflictsServerDeleted :: Set i
,-- ^ These are conflicts where the server has no item but the client has a modified item.
--
-- The server left its item deleted, the client can either delete its items too
-- or deal with the conflicts somehow, and then try to re-sync.
}
Actual synchronization
The complex work of the actual synchronisation has already been done in the item synchronisation. The complexity of collection synchronisation itself comes from matching up client items and server items up in pains on the server side, and from merging the SyncResponse
back into the ClientStore
afterwards.
Creating the SyncRequest
is easy. We just need to not send over the values that have not been modified:
makeSyncRequest :: ClientStore i a -> SyncRequest i a
ClientStore {..} =
makeSyncRequest SyncRequest
= clientStoreAddedItems
{ syncRequestNewItems = M.map timedTime clientStoreSyncedItems
, syncRequestKnownItems = clientStoreSyncedButChangedItems
, syncRequestKnownButChangedItems = clientStoreDeletedItems
, syncRequestDeletedItems }
Processing a 'SyncRequest' on the server-side is tricky. This is a sketch of the solution with some helper functions are left out. These helper functions are left as an exercise to the reader, and the solutions can be found here.
processServerSync ::
forall i a m. (Ord i, Monad m)
=> m i -- ^ The action that is guaranteed to generate unique identifiers
-> ServerStore i a
-> SyncRequest i a
-> m (SyncResponse i a, ServerStore i a)
ServerStore {..} sr@SyncRequest {..}
processServerSync genId -- Make tuples of requests for all of the items that only had a client identifier.
= do
let unidentifedPairs :: Map ClientId (ServerItem a, ItemSyncRequest a)
= M.map (\a -> (ServerEmpty, ItemSyncRequestNew a)) syncRequestNewItems
unidentifedPairs -- Make tuples of results for each of the unidentifier tuples.
unidentifedResults :: Map ClientId (ItemSyncResponse a, ServerItem a)
= M.map (uncurry processServerItemSync) unidentifedPairs
unidentifedResults <- generateIdentifiersFor genId unidentifedResults
generatedResults -- Gather the items that had a server identifier already.
let clientIdentifiedSyncRequests :: Map i (ItemSyncRequest a)
= identifiedItemSyncRequests sr
clientIdentifiedSyncRequests -- Make 'ServerItem's for each of the items on the server side
serverIdentifiedItems :: Map i (ServerItem a)
= M.map ServerFull serverStoreItems
serverIdentifiedItems -- Match up client items with server items by their id.
thesePairs :: Map i (These (ServerItem a) (ItemSyncRequest a))
= unionTheseMaps serverIdentifiedItems clientIdentifiedSyncRequests
thesePairs -- Make tuples of server 'ServerItem's and 'ItemSyncRequest's for each of the items with an id
requestPairs :: Map i (ServerItem a, ItemSyncRequest a)
= M.map (fromThese ServerEmpty ItemSyncRequestPoll) thesePairs
requestPairs -- Make tuples of results for each of the tuplus that had a server identifier.
identifiedResults :: Map i (ItemSyncResponse a, ServerItem a)
= M.map (uncurry processServerItemSync) requestPairs
identifiedResults -- Put together the results together
let allResults :: Map (Identifier i) (ItemSyncResponse a, ServerItem a)
=
allResults
M.unionOnlyServer identifiedResults)
(M.mapKeys uncurry BothServerAndClient) generatedResults)
(M.mapKeys (pure $ produceSyncResults allResults
Merging the SyncResponse
back into the ClientStore
is similarly tricky, and is left as an exercise to the reader again. The solution can be found here.
Testing the implementation
There are a few interesting aspects to the tests. In particular, apart from functionality regarding conlficts, all client-side merge functions should act in the same way. This means that we can make a test suite combinator for them:
mergeFunctionSpec ::
forall a. (Show a, Ord a, GenValid a)
=> (forall i. Ord i =>
ClientStore i a -> SyncResponse i a -> ClientStore i a)
-> Spec
Here we are using {-# LANGUAGE RankNTypes #-}
to describe that the caller can decide on the parameter a
(the values to synchronise), but the callee can decide on the parameter i
(the identifiers for the values). The callee (the test suite combinator), wants to both use pseudorandomly generated UUID
s and also specify that the type of the identifiers should not matter to the merge function.
To generate the UUID
s, the test suite uses a little state transformer monad:
newtype D m a =
D
unD :: StateT StdGen m a
{
}deriving (Generic, Functor, Applicative, Monad, MonadState StdGen, MonadTrans, MonadIO)
evalD :: D Identity a -> a
= runIdentity . evalDM
evalD
runD :: D Identity a -> StdGen -> (a, StdGen)
= runState . unD
runD
evalDM :: Functor m => D m a -> m a
= fst <$> runDM d (mkStdGen 42)
evalDM d
runDM :: D m a -> StdGen -> m (a, StdGen)
= runStateT . unD
runDM
genD :: Monad m => D m UUID
= do
genD <- get
r let (u, r') = random r
put r'pure u
Note that, for the sake of test reproducibility, it is important that the seed of the random generator is in fact fixed. Here we chose 42
.
The mergeFunctionSpec
test suite combinator is called with three different merge functions:
mergeSyncResponseFromServer
mergeSyncResponseIgnoreProblems
A custom merge function that uses the GCounter CRDT to resolve merge conflicts:
"Syncing with mergeSyncResponseUsingStrategy with a GCounter" $
describe @Int $
mergeFunctionSpec
mergeSyncResponseUsingStrategyItemMergeStrategy
=
{ itemMergeStrategyMergeChangeConflict Timed serverItem t) -> Timed (max clientItem serverItem) t
\clientItem (= \serverItem -> Just serverItem
, itemMergeStrategyMergeClientDeletedConflict = \_ -> Nothing
, itemMergeStrategyMergeServerDeletedConflict }
We will not discuss every test here, but you can be sure that it is quite comprehensive. Here is a small overview
Data.Mergeful.Collection initialServerStore is valid initialSyncRequest is valid emptySyncResponse is valid makeSyncRequest produces valid requests mergeAddedItems produces valid results mergeSyncedButChangedItems produces valid results mergeAddedItems produces valid results mergeSyncedButChangedItems produces valid results mergeDeletedItems produces valid results mergeSyncResponseIgnoreProblems produces valid requests processServerSync produces valid tuples of a response and a store Syncing with mergeSyncResponseIgnoreProblems Single client Multi-item succesfully downloads everything from the server for an empty client succesfully uploads everything to the server for an empty server is idempotent with one client Multiple clients Single-item successfully syncs an addition accross to a second client successfully syncs a modification accross to a second client succesfully syncs a deletion across to a second client does not run into a conflict if two clients both try to sync a deletion Multiple items successfully syncs additions accross to a second client succesfully syncs deletions across to a second client does not run into a conflict if two clients both try to sync a deletion does not lose data after a conflict occurs Syncing with mergeSyncResponseFromServer Single client Multi-item succesfully downloads everything from the server for an empty client succesfully uploads everything to the server for an empty server is idempotent with one client Multiple clients Single-item successfully syncs an addition accross to a second client successfully syncs a modification accross to a second client succesfully syncs a deletion across to a second client does not run into a conflict if two clients both try to sync a deletion Multiple items successfully syncs additions accross to a second client succesfully syncs deletions across to a second client does not run into a conflict if two clients both try to sync a deletion mergeSyncResponseFromServer only differs from mergeSyncResponseIgnoreProblems on conflicts does not diverge after a conflict occurs Syncing with mergeSyncResponseUsingStrategy with a GCounter Single client Multi-item succesfully downloads everything from the server for an empty client succesfully uploads everything to the server for an empty server is idempotent with one client Multiple clients Single-item successfully syncs an addition accross to a second client successfully syncs a modification accross to a second client succesfully syncs a deletion across to a second client does not run into a conflict if two clients both try to sync a deletion Multiple items successfully syncs additions accross to a second client succesfully syncs deletions across to a second client does not run into a conflict if two clients both try to sync a deletion
References
The mergeful
library is available on Hackage. Mergeful originated in the work on Smos, Intray and Tickler. This post is part of an effort to encourage contributions to Smos. The simplest contribution could be to just try out smos and provide feedback on the experience. Smos is a purely functional semantic forest editor of a subset of YAML that is intended to replace Emacs' Org-mode for Getting Things Done.