《Time, Clocks, and the Ordering of Events in a Distributed System》 阅读笔记。


论文地址

1. The Partial Ordering

System model

  • The system is composed of a collection of processes
  • Each process consists of a sequence of events
  • Sending or receiving a message is an event in a process

Happened before

The relation “happened before” (denoted by $\rightarrow$) on the set of events of a system is the smallest relation satisfying the following three conditions:

  • if $a$ and $b$ are events in the same process, and $a$ comes before $b$, then $a \rightarrow b$
  • if $a$ is the sending of a message by one process and $b$ is the receipt of the same message by another process, then $a \rightarrow b$
  • if $a \rightarrow b$ and $b \rightarrow c$ then $a \rightarrow c$

Two distinct events $a$ and $b$ are said to be concurrent if $a \nrightarrow b$ and $b \nrightarrow a$. We assume that $a \nrightarrow a$ for any event $a$.

Causally affect

Another way of viewing the definition is to say that $a \to b$ means that it is possible for event $a$ to causally affect event $b$.

Compare to special relativity

  • In relativity, the ordering of events is defined in terms of messages that could be sent.
  • In “happened before” relation, only considering messages that actually are send.

    也就是说,在分布式系统的 “happened before” 中不考虑隐式因果关系.

2. Logical Clocks

Definition

  • We define a clock $C_i$ for each process $P_i$ to be a function which assigns a number $C_i(a)$ to any event $a$ in that process.
  • The entire system of clocks is represented by the function $C$ which assigns to any event $b$ the number $C(b)$, where $C(b) = C_j(b)$ if $b$ is an event in process $P_j$

Clock Condition

Clock Condition: For any event $a, b$: if $a \to b$ then $C(a) < C(b)$.

  • based on the order in which events occur.
  • $C(a) < C(b) \nRightarrow a \to b$, since that would imply that any two concurrent events must occur at the same time.

It is easy to see from our definition of the relation $\to$ that the Clock Condition is satisfied if the following two conditions hold:

  • C1: If $a$ and $b$ are events in process $P_i$, and $a$ comes before $b$, then $C_i(a) < C_i(b)$
  • C2: If $a$ is the sending of a message by process $P_i$ and $b$ is the receipt of that message by pocess $P_j$, then $C_i(a) < C_j(b)$

Implementation Rule of Clock Condition

  • IR1 (meet condition C1): Each process $P_i$ increments $C_i$ between any two successive events
  • IR2 (meet condition C2)
    • If event $a$ is the sending of a message $m$ by process $P_i$, then the message $m$ contains a timestamp $T_m = C_i(a)$
    • Upon receiving a message $m$, process $P_j$ sets $C_j$ greater than or equal to its present value and greater than $T_m$

3. Ordering the Events Totally

至今为止,我们只能得到事件的偏序关系,为了得到事件的全局顺序,我们可通过附加一个 processes 之间的比较关系 $\prec$ 达成目的。

More precisely, we define a relation $\Rightarrow$ as follows: if $a$ is an event in process $P_i$ and $b$ is an event in process $P_j$, then $a \Rightarrow b$ if and only if either

  • $C_i(a) < C_i(b)$

    Implies that if $a \to b$ then $a \Rightarrow b$

  • $C_i(a) = C_j(b)$ and $P_i \prec P_j$

    The oridering $\Rightarrow$ depends upon the system of clocks $C_i$, and is not unique.

4. Strong Clock Condition

因为所构建的系统可能没有包含所有的因果关系,因此可能所构建系统内的事件顺序和实际上的事件顺序不符。因此,为了和实际相符,逻辑时钟需满足以下约束:

Strong Clock Condition: For any events $a, b$ in $\varphi$, if $a \leadsto b$ then $C(a) < C(b)$.

where,

  • $\varphi$ denote the set of all system events.
  • $\leadsto$ denote the “happened before” relation for $\bar{\varphi}$
    • $\bar{\varphi}$ denote the set of events which contains the events in $\varphi$ together with all other relevant external events
    • $a \leadsto b \nRightarrow a \to b$

5. Physical Clocks

For mathematical convenience, we assume that $C_i(t)$ is a continuous, differentiable function of $t$ except for isolated jump discontinuities where the clock is reset.

Physical Clock Condition

  • PC1(correct): There exists a constant $\kappa \ll 1$, such that for all $i$: $|\frac{dC_i(t)}{dt} - 1| < \kappa$
  • PC2(synchronized): There must be a sufficiently small constant $\epsilon$, such that for all $i, j$: $|C_i(t) - C_j(t)| < \epsilon$

Implementation Rule

Let $m$ be a message which is sent at physical time $t$ and received at time $t'$. We define $v_m = t' - t$ to be the total delay of the message $m$. This delay will, of course, not be known to the process which receives $m$. However, we assume that the receiving process knows some minimum delay $\mu_m > 0$ such that $\mu_m \le v_m$. We call $\xi_m = v_m - \mu_m$ the unpredictable delay of the message.

We now specialize rules IR1 and IR2 for our physical clocks as follows:

  • PCIR1: For each $i$, if $P_i$ does not receive a message at physical time $t$, then $C_i$ is differentiable at $t$ and $\frac{dC_i(t)}{dt} > 0$.
  • PCIR2:
    • If $P_i$ sends a message $m$ at physical time $t$, then $m$ contains a timestamp $T_m = C_i(t)$
    • Upon receiving a message $m$ at time $t'$, process $P_j$ sets $C_j(t')$ equal to maximum $(C_j(t' - 0), T_m + \mu_m)$

      $C_j(t' - 0) = lim_{\delta \to 0} C_j(t' - |\delta|)$

THEOREM

Assume a strongly connected graph of processs with diameter $d$ which always obeys rules PCIR1 and PCIR2. Assume that for any message $m$, $\mu_m \le \mu$ for some constant $\mu$, and that for all $t \ge t_0$:

  • PC1 holds
  • There are constants $\tau$ and $\xi$ such that every $\tau$ seconds a message with an unpredictable delay less than $\xi$ is sent over every arc. Then PC2 is satisfied with $\epsilon \approx d(2\kappa \tau + \xi)$ for all $t \gtrsim t_0 + \tau d$, where the approximations assume $\mu + \xi \ll \tau$