No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315
  1. # How to monitor Synapse metrics using Prometheus
  2. 1. Install Prometheus:
  3. Follow instructions at
  4. <http://prometheus.io/docs/introduction/install/>
  5. 1. Enable Synapse metrics:
  6. In `homeserver.yaml`, make sure `enable_metrics` is
  7. set to `True`.
  8. 1. Enable the `/_synapse/metrics` Synapse endpoint that Prometheus uses to
  9. collect data:
  10. There are two methods of enabling the metrics endpoint in Synapse.
  11. The first serves the metrics as a part of the usual web server and
  12. can be enabled by adding the `metrics` resource to the existing
  13. listener as such as in this example:
  14. ```yaml
  15. listeners:
  16. - port: 8008
  17. tls: false
  18. type: http
  19. x_forwarded: true
  20. bind_addresses: ['::1', '127.0.0.1']
  21. resources:
  22. # added "metrics" in this line
  23. - names: [client, federation, metrics]
  24. compress: false
  25. ```
  26. This provides a simple way of adding metrics to your Synapse
  27. installation, and serves under `/_synapse/metrics`. If you do not
  28. wish your metrics be publicly exposed, you will need to either
  29. filter it out at your load balancer, or use the second method.
  30. The second method runs the metrics server on a different port, in a
  31. different thread to Synapse. This can make it more resilient to
  32. heavy load meaning metrics cannot be retrieved, and can be exposed
  33. to just internal networks easier. The served metrics are available
  34. over HTTP only, and will be available at `/_synapse/metrics`.
  35. Add a new listener to homeserver.yaml as in this example:
  36. ```yaml
  37. listeners:
  38. - port: 8008
  39. tls: false
  40. type: http
  41. x_forwarded: true
  42. bind_addresses: ['::1', '127.0.0.1']
  43. resources:
  44. - names: [client, federation]
  45. compress: false
  46. # beginning of the new metrics listener
  47. - port: 9000
  48. type: metrics
  49. bind_addresses: ['::1', '127.0.0.1']
  50. ```
  51. 1. Restart Synapse.
  52. 1. Add a Prometheus target for Synapse.
  53. It needs to set the `metrics_path` to a non-default value (under
  54. `scrape_configs`):
  55. ```yaml
  56. - job_name: "synapse"
  57. scrape_interval: 15s
  58. metrics_path: "/_synapse/metrics"
  59. static_configs:
  60. - targets: ["my.server.here:port"]
  61. ```
  62. where `my.server.here` is the IP address of Synapse, and `port` is
  63. the listener port configured with the `metrics` resource.
  64. If your prometheus is older than 1.5.2, you will need to replace
  65. `static_configs` in the above with `target_groups`.
  66. 1. Restart Prometheus.
  67. 1. Consider using the [grafana dashboard](https://github.com/matrix-org/synapse/tree/master/contrib/grafana/)
  68. and required [recording rules](https://github.com/matrix-org/synapse/tree/master/contrib/prometheus/)
  69. ## Monitoring workers
  70. To monitor a Synapse installation using [workers](workers.md),
  71. every worker needs to be monitored independently, in addition to
  72. the main homeserver process. This is because workers don't send
  73. their metrics to the main homeserver process, but expose them
  74. directly (if they are configured to do so).
  75. To allow collecting metrics from a worker, you need to add a
  76. `metrics` listener to its configuration, by adding the following
  77. under `worker_listeners`:
  78. ```yaml
  79. - type: metrics
  80. bind_address: ''
  81. port: 9101
  82. ```
  83. The `bind_address` and `port` parameters should be set so that
  84. the resulting listener can be reached by prometheus, and they
  85. don't clash with an existing worker.
  86. With this example, the worker's metrics would then be available
  87. on `http://127.0.0.1:9101`.
  88. Example Prometheus target for Synapse with workers:
  89. ```yaml
  90. - job_name: "synapse"
  91. scrape_interval: 15s
  92. metrics_path: "/_synapse/metrics"
  93. static_configs:
  94. - targets: ["my.server.here:port"]
  95. labels:
  96. instance: "my.server"
  97. job: "master"
  98. index: 1
  99. - targets: ["my.workerserver.here:port"]
  100. labels:
  101. instance: "my.server"
  102. job: "generic_worker"
  103. index: 1
  104. - targets: ["my.workerserver.here:port"]
  105. labels:
  106. instance: "my.server"
  107. job: "generic_worker"
  108. index: 2
  109. - targets: ["my.workerserver.here:port"]
  110. labels:
  111. instance: "my.server"
  112. job: "media_repository"
  113. index: 1
  114. ```
  115. Labels (`instance`, `job`, `index`) can be defined as anything.
  116. The labels are used to group graphs in grafana.
  117. ## Renaming of metrics & deprecation of old names in 1.2
  118. Synapse 1.2 updates the Prometheus metrics to match the naming
  119. convention of the upstream `prometheus_client`. The old names are
  120. considered deprecated and will be removed in a future version of
  121. Synapse.
  122. **The old names will be disabled by default in Synapse v1.71.0 and removed
  123. altogether in Synapse v1.73.0.**
  124. | New Name | Old Name |
  125. | ---------------------------------------------------------------------------- | ---------------------------------------------------------------------- |
  126. | python_gc_objects_collected_total | python_gc_objects_collected |
  127. | python_gc_objects_uncollectable_total | python_gc_objects_uncollectable |
  128. | python_gc_collections_total | python_gc_collections |
  129. | process_cpu_seconds_total | process_cpu_seconds |
  130. | synapse_federation_client_sent_transactions_total | synapse_federation_client_sent_transactions |
  131. | synapse_federation_client_events_processed_total | synapse_federation_client_events_processed |
  132. | synapse_event_processing_loop_count_total | synapse_event_processing_loop_count |
  133. | synapse_event_processing_loop_room_count_total | synapse_event_processing_loop_room_count |
  134. | synapse_util_caches_cache_hits | synapse_util_caches_cache:hits |
  135. | synapse_util_caches_cache_size | synapse_util_caches_cache:size |
  136. | synapse_util_caches_cache_evicted_size | synapse_util_caches_cache:evicted_size |
  137. | synapse_util_caches_cache | synapse_util_caches_cache:total |
  138. | synapse_util_caches_response_cache_size | synapse_util_caches_response_cache:size |
  139. | synapse_util_caches_response_cache_hits | synapse_util_caches_response_cache:hits |
  140. | synapse_util_caches_response_cache_evicted_size | synapse_util_caches_response_cache:evicted_size |
  141. | synapse_util_metrics_block_count_total | synapse_util_metrics_block_count |
  142. | synapse_util_metrics_block_time_seconds_total | synapse_util_metrics_block_time_seconds |
  143. | synapse_util_metrics_block_ru_utime_seconds_total | synapse_util_metrics_block_ru_utime_seconds |
  144. | synapse_util_metrics_block_ru_stime_seconds_total | synapse_util_metrics_block_ru_stime_seconds |
  145. | synapse_util_metrics_block_db_txn_count_total | synapse_util_metrics_block_db_txn_count |
  146. | synapse_util_metrics_block_db_txn_duration_seconds_total | synapse_util_metrics_block_db_txn_duration_seconds |
  147. | synapse_util_metrics_block_db_sched_duration_seconds_total | synapse_util_metrics_block_db_sched_duration_seconds |
  148. | synapse_background_process_start_count_total | synapse_background_process_start_count |
  149. | synapse_background_process_ru_utime_seconds_total | synapse_background_process_ru_utime_seconds |
  150. | synapse_background_process_ru_stime_seconds_total | synapse_background_process_ru_stime_seconds |
  151. | synapse_background_process_db_txn_count_total | synapse_background_process_db_txn_count |
  152. | synapse_background_process_db_txn_duration_seconds_total | synapse_background_process_db_txn_duration_seconds |
  153. | synapse_background_process_db_sched_duration_seconds_total | synapse_background_process_db_sched_duration_seconds |
  154. | synapse_storage_events_persisted_events_total | synapse_storage_events_persisted_events |
  155. | synapse_storage_events_persisted_events_sep_total | synapse_storage_events_persisted_events_sep |
  156. | synapse_storage_events_state_delta_total | synapse_storage_events_state_delta |
  157. | synapse_storage_events_state_delta_single_event_total | synapse_storage_events_state_delta_single_event |
  158. | synapse_storage_events_state_delta_reuse_delta_total | synapse_storage_events_state_delta_reuse_delta |
  159. | synapse_federation_server_received_pdus_total | synapse_federation_server_received_pdus |
  160. | synapse_federation_server_received_edus_total | synapse_federation_server_received_edus |
  161. | synapse_handler_presence_notified_presence_total | synapse_handler_presence_notified_presence |
  162. | synapse_handler_presence_federation_presence_out_total | synapse_handler_presence_federation_presence_out |
  163. | synapse_handler_presence_presence_updates_total | synapse_handler_presence_presence_updates |
  164. | synapse_handler_presence_timers_fired_total | synapse_handler_presence_timers_fired |
  165. | synapse_handler_presence_federation_presence_total | synapse_handler_presence_federation_presence |
  166. | synapse_handler_presence_bump_active_time_total | synapse_handler_presence_bump_active_time |
  167. | synapse_federation_client_sent_edus_total | synapse_federation_client_sent_edus |
  168. | synapse_federation_client_sent_pdu_destinations_count_total | synapse_federation_client_sent_pdu_destinations:count |
  169. | synapse_federation_client_sent_pdu_destinations_total | synapse_federation_client_sent_pdu_destinations:total |
  170. | synapse_handlers_appservice_events_processed_total | synapse_handlers_appservice_events_processed |
  171. | synapse_notifier_notified_events_total | synapse_notifier_notified_events |
  172. | synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter |
  173. | synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter |
  174. | synapse_http_httppusher_http_pushes_processed_total | synapse_http_httppusher_http_pushes_processed |
  175. | synapse_http_httppusher_http_pushes_failed_total | synapse_http_httppusher_http_pushes_failed |
  176. | synapse_http_httppusher_badge_updates_processed_total | synapse_http_httppusher_badge_updates_processed |
  177. | synapse_http_httppusher_badge_updates_failed_total | synapse_http_httppusher_badge_updates_failed |
  178. | synapse_admin_mau_current | synapse_admin_mau:current |
  179. | synapse_admin_mau_max | synapse_admin_mau:max |
  180. | synapse_admin_mau_registered_reserved_users | synapse_admin_mau:registered_reserved_users |
  181. Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
  182. ---------------------------------------------------------------------------------
  183. The duplicated metrics deprecated in Synapse 0.27.0 have been removed.
  184. All time duration-based metrics have been changed to be seconds. This
  185. affects:
  186. | msec -> sec metrics |
  187. | -------------------------------------- |
  188. | python_gc_time |
  189. | python_twisted_reactor_tick_time |
  190. | synapse_storage_query_time |
  191. | synapse_storage_schedule_time |
  192. | synapse_storage_transaction_time |
  193. Several metrics have been changed to be histograms, which sort entries
  194. into buckets and allow better analysis. The following metrics are now
  195. histograms:
  196. | Altered metrics |
  197. | ------------------------------------------------ |
  198. | python_gc_time |
  199. | python_twisted_reactor_pending_calls |
  200. | python_twisted_reactor_tick_time |
  201. | synapse_http_server_response_time_seconds |
  202. | synapse_storage_query_time |
  203. | synapse_storage_schedule_time |
  204. | synapse_storage_transaction_time |
  205. Block and response metrics renamed for 0.27.0
  206. ---------------------------------------------
  207. Synapse 0.27.0 begins the process of rationalising the duplicate
  208. `*:count` metrics reported for the resource tracking for code blocks and
  209. HTTP requests.
  210. At the same time, the corresponding `*:total` metrics are being renamed,
  211. as the `:total` suffix no longer makes sense in the absence of a
  212. corresponding `:count` metric.
  213. To enable a graceful migration path, this release just adds new names
  214. for the metrics being renamed. A future release will remove the old
  215. ones.
  216. The following table shows the new metrics, and the old metrics which
  217. they are replacing.
  218. | New name | Old name |
  219. | ------------------------------------------------------------- | ---------------------------------------------------------- |
  220. | synapse_util_metrics_block_count | synapse_util_metrics_block_timer:count |
  221. | synapse_util_metrics_block_count | synapse_util_metrics_block_ru_utime:count |
  222. | synapse_util_metrics_block_count | synapse_util_metrics_block_ru_stime:count |
  223. | synapse_util_metrics_block_count | synapse_util_metrics_block_db_txn_count:count |
  224. | synapse_util_metrics_block_count | synapse_util_metrics_block_db_txn_duration:count |
  225. | synapse_util_metrics_block_time_seconds | synapse_util_metrics_block_timer:total |
  226. | synapse_util_metrics_block_ru_utime_seconds | synapse_util_metrics_block_ru_utime:total |
  227. | synapse_util_metrics_block_ru_stime_seconds | synapse_util_metrics_block_ru_stime:total |
  228. | synapse_util_metrics_block_db_txn_count | synapse_util_metrics_block_db_txn_count:total |
  229. | synapse_util_metrics_block_db_txn_duration_seconds | synapse_util_metrics_block_db_txn_duration:total |
  230. | synapse_http_server_response_count | synapse_http_server_requests |
  231. | synapse_http_server_response_count | synapse_http_server_response_time:count |
  232. | synapse_http_server_response_count | synapse_http_server_response_ru_utime:count |
  233. | synapse_http_server_response_count | synapse_http_server_response_ru_stime:count |
  234. | synapse_http_server_response_count | synapse_http_server_response_db_txn_count:count |
  235. | synapse_http_server_response_count | synapse_http_server_response_db_txn_duration:count |
  236. | synapse_http_server_response_time_seconds | synapse_http_server_response_time:total |
  237. | synapse_http_server_response_ru_utime_seconds | synapse_http_server_response_ru_utime:total |
  238. | synapse_http_server_response_ru_stime_seconds | synapse_http_server_response_ru_stime:total |
  239. | synapse_http_server_response_db_txn_count | synapse_http_server_response_db_txn_count:total |
  240. | synapse_http_server_response_db_txn_duration_seconds | synapse_http_server_response_db_txn_duration:total |
  241. Standard Metric Names
  242. ---------------------
  243. As of synapse version 0.18.2, the format of the process-wide metrics has
  244. been changed to fit prometheus standard naming conventions. Additionally
  245. the units have been changed to seconds, from milliseconds.
  246. | New name | Old name |
  247. | ---------------------------------------- | --------------------------------- |
  248. | process_cpu_user_seconds_total | process_resource_utime / 1000 |
  249. | process_cpu_system_seconds_total | process_resource_stime / 1000 |
  250. | process_open_fds (no \'type\' label) | process_fds |
  251. The python-specific counts of garbage collector performance have been
  252. renamed.
  253. | New name | Old name |
  254. | -------------------------------- | -------------------------- |
  255. | python_gc_time | reactor_gc_time |
  256. | python_gc_unreachable_total | reactor_gc_unreachable |
  257. | python_gc_counts | reactor_gc_counts |
  258. The twisted-specific reactor metrics have been renamed.
  259. | New name | Old name |
  260. | -------------------------------------- | ----------------------- |
  261. | python_twisted_reactor_pending_calls | reactor_pending_calls |
  262. | python_twisted_reactor_tick_time | reactor_tick_time |