Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
  1. # Auth Chain Difference Algorithm
  2. The auth chain difference algorithm is used by V2 state resolution, where a
  3. naive implementation can be a significant source of CPU and DB usage.
  4. ### Definitions
  5. A *state set* is a set of state events; e.g. the input of a state resolution
  6. algorithm is a collection of state sets.
  7. The *auth chain* of a set of events are all the events' auth events and *their*
  8. auth events, recursively (i.e. the events reachable by walking the graph induced
  9. by an event's auth events links).
  10. The *auth chain difference* of a collection of state sets is the union minus the
  11. intersection of the sets of auth chains corresponding to the state sets, i.e an
  12. event is in the auth chain difference if it is reachable by walking the auth
  13. event graph from at least one of the state sets but not from *all* of the state
  14. sets.
  15. ## Breadth First Walk Algorithm
  16. A way of calculating the auth chain difference without calculating the full auth
  17. chains for each state set is to do a parallel breadth first walk (ordered by
  18. depth) of each state set's auth chain. By tracking which events are reachable
  19. from each state set we can finish early if every pending event is reachable from
  20. every state set.
  21. This can work well for state sets that have a small auth chain difference, but
  22. can be very inefficient for larger differences. However, this algorithm is still
  23. used if we don't have a chain cover index for the room (e.g. because we're in
  24. the process of indexing it).
  25. ## Chain Cover Index
  26. Synapse computes auth chain differences by pre-computing a "chain cover" index
  27. for the auth chain in a room, allowing us to efficiently make reachability queries
  28. like "is event `A` in the auth chain of event `B`?". We could do this with an index
  29. that tracks all pairs `(A, B)` such that `A` is in the auth chain of `B`. However, this
  30. would be prohibitively large, scaling poorly as the room accumulates more state
  31. events.
  32. Instead, we break down the graph into *chains*. A chain is a subset of a DAG
  33. with the following property: for any pair of events `E` and `F` in the chain,
  34. the chain contains a path `E -> F` or a path `F -> E`. This forces a chain to be
  35. linear (without forks), e.g. `E -> F -> G -> ... -> H`. Each event in the chain
  36. is given a *sequence number* local to that chain. The oldest event `E` in the
  37. chain has sequence number 1. If `E` has a child `F` in the chain, then `F` has
  38. sequence number 2. If `E` has a grandchild `G` in the chain, then `G` has
  39. sequence number 3; and so on.
  40. Synapse ensures that each persisted event belongs to exactly one chain, and
  41. tracks how the chains are connected to one another. This allows us to
  42. efficiently answer reachability queries. Doing so uses less storage than
  43. tracking reachability on an event-by-event basis, particularly when we have
  44. fewer and longer chains. See
  45. > Jagadish, H. (1990). [A compression technique to materialize transitive closure](https://doi.org/10.1145/99935.99944).
  46. > *ACM Transactions on Database Systems (TODS)*, 15*(4)*, 558-598.
  47. for the original idea or
  48. > Y. Chen, Y. Chen, [An efficient algorithm for answering graph
  49. > reachability queries](https://doi.org/10.1109/ICDE.2008.4497498),
  50. > in: 2008 IEEE 24th International Conference on Data Engineering, April 2008,
  51. > pp. 893–902. (PDF available via [Google Scholar](https://scholar.google.com/scholar?q=Y.%20Chen,%20Y.%20Chen,%20An%20efficient%20algorithm%20for%20answering%20graph%20reachability%20queries,%20in:%202008%20IEEE%2024th%20International%20Conference%20on%20Data%20Engineering,%20April%202008,%20pp.%20893902.).)
  52. for a more modern take.
  53. In practical terms, the chain cover assigns every event a
  54. *chain ID* and *sequence number* (e.g. `(5,3)`), and maintains a map of *links*
  55. between events in chains (e.g. `(5,3) -> (2,4)`) such that `A` is reachable by `B`
  56. (i.e. `A` is in the auth chain of `B`) if and only if either:
  57. 1. `A` and `B` have the same chain ID and `A`'s sequence number is less than `B`'s
  58. sequence number; or
  59. 2. there is a link `L` between `B`'s chain ID and `A`'s chain ID such that
  60. `L.start_seq_no` <= `B.seq_no` and `A.seq_no` <= `L.end_seq_no`.
  61. There are actually two potential implementations, one where we store links from
  62. each chain to every other reachable chain (the transitive closure of the links
  63. graph), and one where we remove redundant links (the transitive reduction of the
  64. links graph) e.g. if we have chains `C3 -> C2 -> C1` then the link `C3 -> C1`
  65. would not be stored. Synapse uses the former implementation so that it doesn't
  66. need to recurse to test reachability between chains. This trades-off extra storage
  67. in order to save CPU cycles and DB queries.
  68. ### Example
  69. An example auth graph would look like the following, where chains have been
  70. formed based on type/state_key and are denoted by colour and are labelled with
  71. `(chain ID, sequence number)`. Links are denoted by the arrows (links in grey
  72. are those that would be remove in the second implementation described above).
  73. ![Example](auth_chain_diff.dot.png)
  74. Note that we don't include all links between events and their auth events, as
  75. most of those links would be redundant. For example, all events point to the
  76. create event, but each chain only needs the one link from it's base to the
  77. create event.
  78. ## Using the Index
  79. This index can be used to calculate the auth chain difference of the state sets
  80. by looking at the chain ID and sequence numbers reachable from each state set:
  81. 1. For every state set lookup the chain ID/sequence numbers of each state event
  82. 2. Use the index to find all chains and the maximum sequence number reachable
  83. from each state set.
  84. 3. The auth chain difference is then all events in each chain that have sequence
  85. numbers between the maximum sequence number reachable from *any* state set and
  86. the minimum reachable by *all* state sets (if any).
  87. Note that steps 2 is effectively calculating the auth chain for each state set
  88. (in terms of chain IDs and sequence numbers), and step 3 is calculating the
  89. difference between the union and intersection of the auth chains.
  90. ### Worked Example
  91. For example, given the above graph, we can calculate the difference between
  92. state sets consisting of:
  93. 1. `S1`: Alice's invite `(4,1)` and Bob's second join `(2,2)`; and
  94. 2. `S2`: Alice's second join `(4,3)` and Bob's first join `(2,1)`.
  95. Using the index we see that the following auth chains are reachable from each
  96. state set:
  97. 1. `S1`: `(1,1)`, `(2,2)`, `(3,1)` & `(4,1)`
  98. 2. `S2`: `(1,1)`, `(2,1)`, `(3,2)` & `(4,3)`
  99. And so, for each the ranges that are in the auth chain difference:
  100. 1. Chain 1: None, (since everything can reach the create event).
  101. 2. Chain 2: The range `(1, 2]` (i.e. just `2`), as `1` is reachable by all state
  102. sets and the maximum reachable is `2` (corresponding to Bob's second join).
  103. 3. Chain 3: Similarly the range `(1, 2]` (corresponding to the second power
  104. level).
  105. 4. Chain 4: The range `(1, 3]` (corresponding to both of Alice's joins).
  106. So the final result is: Bob's second join `(2,2)`, the second power level
  107. `(3,2)` and both of Alice's joins `(4,2)` & `(4,3)`.