Time Sense – Lamport and Vector Clocks


Architecture, Database, Distributed Systems, Events, Microservices / Sunday, April 1st, 2018

The hurrier I go, the behinder I get.

Lewis Carroll

“Time is an illusion”, so said Einstein and who would dare argue with him on the subject of time? It is now a well established scientific fact that he is indeed right. Time is neither absolute nor independent in existence. Yet in our everyday life, time continues its illusory hold on us, an exceptional one at that, given that we haven’t been able to break free of it in spite of what has already been proven, by Einstein and other great minds following him. Our intuitive concept of time continues to be one of its absoluteness and its independent existence. It is so intuitive and ingrained in our habits that we naturally tend to extend this concept to everything we do and reason about, including to the world of computers as well.

However the concept of time is an even bigger illusion and a greater problem in the distributed world. I am specifically talking about distributed systems that communicate using asynchronous messaging. So why do we care about time in distributed systems ? We care because it affects the way we perceive the order of events in the system. At a fundamental level, ordering affects the behavior of our applications.

Leslie Lamport in his seminal paper ‘Time, Clocks, and the ordering of events in a distributed system‘ addresses this very topic and this paper is said to be the most important and the most cited paper in Computer Science. If you think about it, although we see the passage of time as a continuum, the way we actually experience it in life is through specific events.The same can be said about distributed systems too.

Time has no independent existence apart from the order of events by which we measure it.

Albert Einstein

So how do we order events ? The simplest answer is that we can order events by time, the physical clock time. This is in fact a very common approach on a single process which uses the local clock on the machine. In a distributed system, there are multiple clocks and for correctness of ordering, all the clocks should be accurate and precisely synchronized to the exact same time. This is a non-trivial task because of clock drift and clock skew. As the number of clocks to be synchronized increases, so does the difficulty in keeping them precise and accurate. Network Time Protocol (NTP) is one such old but popular protocol that we have used in the systems that I’ve worked on and I’ll just leave it at that. (There is always a residual error with clock synchronization due to effect of network latencies on messages used to synchronize the clocks + security issues etc).

A far more interesting approach is the concept and use of logical clocks. Both Lamport and Vector clocks belong to this category. Most modern distributed systems make use of logical clocks.

Lamport Clocks

Lamport Clocks are built around the basic constructs of events and ‘happened before‘ relationship. Events are either internal operations in processes or external messages sent and received between communicating processes. The ‘happened before‘ relationship is denoted as → and follows the below rules:

  • If a and b are events in the same process and a comes before b, then a → b
  • If a is the sending of a message from one process and b is the receiving of the same message at another process, then a → b
  • Transitively if a → b and b → c, then a → c

In these cases, the events a and b are also said to be causally related. If → relationship cannot be established between any two events, they are said to be concurrent. Now imagine assigning timestamps to all the events but not using the physical clocks. The timestamps are logical timestamps and as long as the timestamps obey the → causality relationship we should be able to order the events using these timestamps.

For causality, If a → b, then timestamp(a) → timestamp(b)

So we can see that for any two events within a process, timestamps as per the above rule are sufficient to order them. For external events, the sending of a message gets an earlier timestamp than the receiving of the same message as per the second point above and thus establish the correct ordering between them as well. Note that this creates a partial ordering among the set of all events.

The logical clock can be implemented by a simple counter and the algorithm operates as below:

  • All initial counter values are set to 0
  • The counter is incremented whenever an internal event occurs and when it sends a message. The counter value is piggybacked on the sent message as the timestamp
  • On receive of a message, the counter is set to max(local counter, received timestamp) + 1

Below is an illustration of  how Lamport timestamps are assigned.

Along the x axis are three processes P, Q, R and along the y axis is time. A few causaltity relationships and their timestamps are obvious, a → b, c →g, b → i etc. Of particular interest are the events a and d. Although they might look like there is an ordering as per the time axis, there is no causal relationship between them. We cannot say with certainty if one preceded the other and hence they represent a concurrent relationship denoted as a || d. Thus having Ts(a) < Ts(b) where Ts  = timestamp does not imply a → b. When looking at any pair of Lamport timestamps, we can wrongly conclude that there is a causality relationship where none exists between an independent pair of events.

The Lamport paper goes on to define an additional rule to impose a total ordering among all the events by using an arbitrary tie breaker such as the process id for cases where the timestamps are of equal value. The relationship is denoted by ⇒ and defined by the rule:

  • If a is an event in process Pi and b is an event in process Pj then a ⇒ b iff either {Ts(a) < Ts(b) } or { Ts(a) = Ts(b) and Pi < Pj}

Although it defines a total ordering, the issue with causality persists as in the partial order case.

Vector Clocks

Vector clocks solve the issue that is seen with Lamport clocks. Given a pair of two events with their vector timestamps, we can tell if they are causally related or concurrent without ambiguity. This is achieved by using a vector of counters, one for each of the processes. Assuming there are n processes,

  • Each process maintains a vector of clock counters V[1….n]
  • For process i, Vi[i] represents the number of local events at i. This is the local clock at i
  • For process i, Vi[j] represents the number of events at process j. This is hence the knowledge at i of the local clock at process j

The algorithm to update the clocks works very similar to the Lamport clocks.

  • A process Pi increments only the local clock on internal events
  • A process Pi increments the local clock on a send event and piggybacks the vector clock on to the message
  • A process Pi on receiving a message increments :
    • Vi[i] = Vi[i] + 1
    • Vi[j] = max(Vi[j], Vm[j]) where m = message and i ≠ j

The causality and concurrent relationships are defined by the following rules :

  • VC1 = VC2, iff  VC1[i] = VC2[i], for all i = 1 … N where VT represents the Vector timestamp or clock
  • VC1VC2, iff  VC1[i] ≤ VC2[i], for all i = 1 …. N
  •  Two events are causally related iff VC1 < VC2.  i.e. VC1 ≤ VC2 & there exists a j such that VC1[j] < VC2 [j] where 1 ≤ j ≤ N
  • Two events are concurrent iff  (NOT VC1 ≤ VC2 AND NOT VC2 ≤ VC1)

Below is an example that shows the vector timestamps for a set of processes and events.

As per the figure we can see that a →b, a →d, b → f, e → k etc and their vector timestamps reflect the causality relationship. A few concurrent relationships c || l, h || l, a || j etc and their vector timestamps reflect the concurrency.

A major issue with Vector clocks is the expense associated with maintaining the vectors both in terms of time  and space (O(N)) as it is dependent on the number of processes. There are several approaches that try to optimize on the vector size, but they mostly come at a cost of reduced strength of determining causality relationships. Interval Tree clocks is one such approach that I’ve come across but I’ve not really taken the time to look in depth. I’ve to mention Hybrid clocks as well as they are more useful, especially from a human debugging and traceability  perspective as they combine both a physical clock and a logical clock with every event. Having a physical clock component associated with timestamps certainly makes more sense to us rather than having a vector of integers associated with events, as in Logs used for debugging. So if you need the timestamps for human consumption, then it is worth looking into Hybrid clocks.

In conclusion vector clocks can be used to order events among a set of distributed processes such as microservices, especially when the messaging middleware does not support ordering. An example is when Kafka is used as the messaging bus. Kafka guarantees an ordered delivery only within a certain set of events, i.e events within an individual partition.  A system will usually have multiple such partitions. When trying to join event streams from different partitions of Kafka if an ordering is required between them, then it has to be done at a higher layer, as in the individual microservices using vector clocks.

Vector clocks or Version vectors ?

One last but nevertheless an extremely important point. There seems to be a lot of mix up when it comes to two similar concepts – vector clocks and version vectors. This blog post covers the differences in more detail.  In brief, Version vectors are used in many replicated database systems to establish partial ordering between them. It is sufficient in these cases to just track the state changes and history rather than the set of all events leading to state changes. Hence version vectors can be implemented with a smaller vector size compared to vector clocks. Version clocks on the other hand are used to establish ordering among events, so the lower bound on the size is still very much dependent on the number of processes.

A few examples

Riak datastore is one example of such a system. They have a series of blog posts that talk about how they use version vectors to identify conflicts and resolve them. Amazon Dynamo also uses (at least the Dynamo paper says so, don’t know what modifications it has under gone since the paper was published) version vectors. Cassandra on the other hand, uses synchronized clocks and applies the ‘Last write wins’ rule. Then there is Google’s TrueTime API which is akin to having a single global clock which is achieved by synchronizing all of their clocks based on GPS receivers and atomic clocks. (Contrary to the name ‘TrueTime’ it is not really that all clocks are at the exact same time but that it defines an upperbound on the offset between clocks (6ms or so, the last I checked)). This is used in Spanner ( global replicated database) and is also available for all as a cloud service. Amazon last year announced the Amazon Time Sync service that does something along the same lines I think (haven’t investigated it to comment further)

This post has ended up much lengthier that I had planned and I chalk it up to the breadth of this topic – there are so many interesting approaches and algorithms and it is an ever evolving area. Time flew as I put together this post and as they say “Time is precious, waste it wisely”. Well, I hope I did in this case.

One Reply to “Time Sense – Lamport and Vector Clocks”

Leave a Reply

Your email address will not be published. Required fields are marked *