You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
  1. # User Directory API Implementation
  2. The user directory is maintained based on users that are 'visible' to the homeserver -
  3. i.e. ones which are local to the server and ones which any local user shares a
  4. room with.
  5. The directory info is stored in various tables, which can sometimes get out of
  6. sync (although this is considered a bug). If this happens, for now the
  7. solution to fix it is to use the [admin API](usage/administration/admin_api/background_updates.md#run)
  8. and execute the job `regenerate_directory`. This should then start a background task to
  9. flush the current tables and regenerate the directory. Depending on the size
  10. of your homeserver (number of users and rooms) this can take a while.
  11. ## Data model
  12. There are five relevant tables that collectively form the "user directory".
  13. Three of them track a list of all known users. The last two (collectively called
  14. the "search tables") track which users are visible to each other.
  15. From all of these tables we exclude three types of local user:
  16. - support users
  17. - appservice users
  18. - deactivated users
  19. A description of each table follows:
  20. * `user_directory`. This contains the user ID, display name and avatar of each user.
  21. - Because there is only one directory entry per user, it is important that it
  22. only contain publicly visible information. Otherwise, this will leak the
  23. nickname or avatar used in a private room.
  24. - Indexed on rooms. Indexed on users.
  25. * `user_directory_search`. To be joined to `user_directory`. It contains an extra
  26. column that enables full text search based on user IDs and display names.
  27. Different schemas for SQLite and Postgres are used.
  28. - Indexed on the full text search data. Indexed on users.
  29. * `user_directory_stream_pos`. When the initial background update to populate
  30. the directory is complete, we record a stream position here. This indicates
  31. that synapse should now listen for room changes and incrementally update
  32. the directory where necessary. (See [stream positions](development/synapse_architecture/streams.html).)
  33. * `users_in_public_rooms`. Contains associations between users and the public
  34. rooms they're in. Used to determine which users are in public rooms and should
  35. be publicly visible in the directory. Both local and remote users are tracked.
  36. * `users_who_share_private_rooms`. Rows are triples `(L, M, room id)` where `L`
  37. is a local user and `M` is a local or remote user. `L` and `M` should be
  38. different, but this isn't enforced by a constraint.
  39. Note that if two local users share a room then there will be two entries:
  40. `(user1, user2, !room_id)` and `(user2, user1, !room_id)`.
  41. ## Configuration options
  42. The exact way user search works can be tweaked via some server-level
  43. [configuration options](usage/configuration/config_documentation.md#user_directory).
  44. The information is not repeated here, but the options are mentioned below.
  45. ## Search algorithm
  46. If `search_all_users` is `false`, then results are limited to users who:
  47. 1. Are found in the `users_in_public_rooms` table, or
  48. 2. Are found in the `users_who_share_private_rooms` where `L` is the requesting
  49. user and `M` is the search result.
  50. Otherwise, if `search_all_users` is `true`, no such limits are placed and all
  51. users known to the server (matching the search query) will be returned.
  52. By default, locked users are not returned. If `show_locked_users` is `true` then
  53. no filtering on the locked status of a user is done.
  54. The user provided search term is lowercased and normalized using [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization),
  55. this treats the string as case-insensitive, canonicalizes different forms of the
  56. same text, and maps some "roughly equivalent" characters together.
  57. The search term is then split into words:
  58. * If [ICU](https://en.wikipedia.org/wiki/International_Components_for_Unicode) is
  59. available, then the system's [default locale](https://unicode-org.github.io/icu/userguide/locale/#default-locales)
  60. will be used to break the search term into words. (See the
  61. [installation instructions](setup/installation.md) for how to install ICU.)
  62. * If unavailable, then runs of ASCII characters, numbers, underscores, and hyphens
  63. are considered words.
  64. The queries for PostgreSQL and SQLite are detailed below, by their overall goal
  65. is to find matching users, preferring users who are "real" (e.g. not bots,
  66. not deactivated). It is assumed that real users will have an display name and
  67. avatar set.
  68. ### PostgreSQL
  69. The above words are then transformed into two queries:
  70. 1. "exact" which matches the parsed words exactly (using [`to_tsquery`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES));
  71. 2. "prefix" which matches the parsed words as prefixes (using `to_tsquery`).
  72. Results are composed of all rows in the `user_directory_search` table whose information
  73. matches one (or both) of these queries. Results are ordered by calculating a weighted
  74. score for each result, higher scores are returned first:
  75. * 4x if a user ID exists.
  76. * 1.2x if the user has a display name set.
  77. * 1.2x if the user has an avatar set.
  78. * 0x-3x by the full text search results using the [`ts_rank_cd` function](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING)
  79. against the "exact" search query; this has four variables with the following weightings:
  80. * `D`: 0.1 for the user ID's domain
  81. * `C`: 0.1 for unused
  82. * `B`: 0.9 for the user's display name (or an empty string if it is not set)
  83. * `A`: 0.1 for the user ID's localpart
  84. * 0x-1x by the full text search results using the `ts_rank_cd` function against the
  85. "prefix" search query. (Using the same weightings as above.)
  86. * If `prefer_local_users` is `true`, then 2x if the user is local to the homeserver.
  87. Note that `ts_rank_cd` returns a weight between 0 and 1. The initial weighting of
  88. all results is 1.
  89. ### SQLite
  90. Results are composed of all rows in the `user_directory_search` whose information
  91. matches the query. Results are ordered by the following information, with each
  92. subsequent column used as a tiebreaker, for each result:
  93. 1. By the [`rank`](https://www.sqlite.org/windowfunctions.html#built_in_window_functions)
  94. of the full text search results using the [`matchinfo` function](https://www.sqlite.org/fts3.html#matchinfo). Higher
  95. ranks are returned first.
  96. 2. If `prefer_local_users` is `true`, then users local to the homeserver are
  97. returned first.
  98. 3. Users with a display name set are returned first.
  99. 4. Users with an avatar set are returned first.