Deep dive into postgres stats: pg_stat_database

Everything you always wanted to know about Postgres stats

In this post we continue our discussion about postgres stats. This time we’ll be focusing on pg_stat_database. As mentioned in postgres documentation, this view contains one row for each database in the cluster, showing database-wide statistics. It is well known that postgres may have several databases within single instance, hence this view contains stats about all of them.
You can find full description of view’s columns in the official documentation so here I will focus on types of problems that it helps us to solve:

Cache hit ratio.
Commit ratio.
Database anomalies.
Load distribution.

Cache hit ratio. Everybody knows that reading from memory is always faster than from disk and that’s better when our data fit in memory. When clients request data, postgres checks shared memory and if there are no relevant data there it has to read it from disk, thus queries become slower. Using pg_stat_database we can track this process and estimate cache hit ratio. This view contains information about number of blocks read from shared buffer cache or from disk — blks_hit and blks_read. Using simple calculations it can provide desired values:

SELECT
datname, 100 * blks_hit / (blks_hit + blks_read) as cache_hit_ratio
FROM pg_stat_database WHERE (blks_hit + blks_read) > 0;
or summary among all databases:

SELECT
round(100 * sum(blks_hit) / sum(blks_hit + blks_read), 3) as cache_hit_ratio
FROM pg_stat_database;

Sweet spot here are values close to 100 — it means that the almost all necessary data were read from shared buffers. Values near 90 show that postgres read from disk time to time. And values below 80 show that we have insufficient amount of shared buffers or physical RAM . Data required for top-called queries don’t fit into memory, and postgres has to read it from disk. It’s quite good if this data in the OS page cache, if they aren’t there it’s a bad scenario. The basic idea here is increased amount of shared buffers — good starting point for this is 25% of the available RAM. When all databases are able to fit in RAM, the good starting point is to allocate 80% of all available RAM.
Note, when postgres is restarted and actively fills buffer cache, it shows low cache hit ratio and this is normal behaviour.

Commit ratio. Commit ratio is the similar to cache hit ratio, but in addition it shows the amount of successful operations. It’s well known that changes made by transactions may be aborted (rollback) or commited. If rollback isn’t properly handled by an application, it can be considered as an error. Also, single queries that have failed outside the transactions are also accounted as rollbacks. So, in general, using commit ratio we can estimate amount of errors in a particular database. For commit ratio, xact_commit and xact_rollback values are used, and using queries like above it’s possible to calculate commit ratio. Here is an example which shows per-database results:

SELECT
datname, 100 * xact_commit / (xact_commit + xact_rollback) as commit_ratio
FROM pg_stat_database WHERE (xact_commit + xact_rollback) > 0;

Result values may vary between 1 to 100. Values that are closer to 100 mean that you database has very few errors. In case when commit ratio is below 90, a good idea is to configure proper logging and check out logs for errors, built a list of most often errors and begin to eliminate them step by step.

Database anomalies. Another pg_stat_database’s useful option is the anomaly detection. Anomalies are the unwanted events occurring in databases and this stats view provides information about rollbacks, recovery conflicts, deadlocks and temporary files. All these events are unwanted and if there are too many of them, you should pay attention and try to eliminate their sources.

As mentioned above, rollbacks aren’t the only transaction aborts — it also failed queries, so do check logs to understand what caused an error. Another type of anomalies is recovery conflicts — the situation when queries running on standbys are cancelled. From user’s perspective it looks like a failed query and when it fails, additional details written in the postgres logs. Also, a good point to start an investigation is the pg_stat_database_conflicts view – there may be various reasons for conflicts to occur and this view allows us to understand the exact cause.

Another issue are deadlocks — when two or more transactions have locked resources and are trying to obtain resources already locked by other transactions. In normal situation xact A locks resource A and xact B locks resource B. In deadlock situation xact A will try to lock resource B. This, in turn puts it into hold mode since xact B has already locked resource B. This way both transactions lock each other’s resources. When a deadlock occurs, postgres cancels one of the involved transactions and removes it from the waiting queue so that other xacts continue their work.

As a rule, deadlocks warn you about application’s bad design, and general recommendation here is to fix application logic which causes the deadlocks. Good starting point here is to enable log_lock_waits option in postgresql.conf, reload and check logs time to time. When a deadlock occurs, extra information about conflicted queries will be logged.

The last anomaly type is temporary files — when postgres tries to sort or hide huge amount of data and has insufficient memory, it swaps data between memory and temporary files located on disk. As we know, disks are slower than memory, so queries that use temp files potentially become slower. Good practice is to track temp files and problematic queries. The log_temp_files option may help detecting queries which create temp files. Next step is to either rewrite problematic queries but if it’s very hard or impossible, an alternative is to increase work_mem — this is quite flexible setting and postgres allows to change it per-session, per-user or for a whole server. In latter case you do need to be careful since large work_mem values for the whole server may cause OOM (Out-of-memory condition) and harm postgres.

Load distribution. And to finish here are a few words on load distribution. This metric isn’t as important as metrics mentioned above, however, sometimes it is useful when you need to understand the kind of workload in your database. Is your workload write- or read-intensive? Answer to this question could give you the following metrics: tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted. They reveal how many operations were done by queries. With your favorite monitoring system, which can make graphs using these numbers, you can quickly track load spikes or collapses and react appropriately.

To summarise, the pg_stat_database view contains general information about databases and events occured, so tracking numbers from this view is a good starting point that allows you to assess whether everything is fine with your databases. Of course, using only that view is insufficient and additional resources like views and logs should be used. In my next post I will continue to explain other stats views and how to efficiently use them.

Комментарии: 4

4 комментария на «“Deep dive into postgres stats: pg_stat_database”»

ramesh:

4 марта, 2017 в 5:11 пп

Thanks for the nice writeup.

Ответить
Unknown:

21 марта, 2017 в 10:02 пп

I came across this via Postgres Weekly, and it was great! Thanks for the informative post!

Ответить
Vikrant Rana:

8 июля, 2021 в 4:39 дп

how to see the deadlock info when we get something like this..

pm-cm-state=> select * from pg_stat_database where datname=’test-database’;
-[ RECORD 1 ]—+——————————
datid | 16426
datname | test-database
numbackends | 79
xact_commit | 249094379
xact_rollback | 11736197
blks_read | 149795669203
blks_hit | 520675981846
tup_returned | 363801657954
tup_fetched | 9935146382
tup_inserted | 557286717
tup_updated | 1917424051
tup_deleted | 656441973
conflicts | 0
temp_files | 26070
temp_bytes | 3033893217008
deadlocks | 1
blk_read_time | 0
blk_write_time | 0
stats_reset | 2021-04-19 15:08:26.031026+00

Ответить
- Alexey Lesovsky:
  
  8 июля, 2021 в 8:39 дп
  
  Check logs of Postgres service, there you can see info about deadlocks occurred. But you should have logs from «2021-04-19 15:08:26» because deadlock occurred after this time.
  
  Ответить

Добавить комментарий Отменить ответ

Базовый	Премиум	Enterprise
До 10 серверов	До 40 серверов	До 100 серверов
Чат, аварийный телефон	Чат, аварийный телефон	Чат, аварийный телефон
до 10 часов работы DBA/месяц*	до 25 часов работы DBA/месяц*	до 60 часов работы DBA/месяц*
SLA проблема — до 1 ч., стандартные работы — до 8 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.
24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 30 мин
Автоматические Health Check	Автоматические Health Check с рекомендациями от DBA	Индивидуальная проверка ваших БД нашими DBA
Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально. Настоящее предложение не является публичной офертой.Возможно платное увеличение лимита часов на базовые работы.При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу. По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу. Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.*

Базовый

Премиум

Enterprise

До 10 серверов

До 40 серверов

До 100 серверов

Чат, аварийный телефон

до 10 часов работы DBA/месяц*

до 25 часов работы DBA/месяц*

до 60 часов работы DBA/месяц*

SLA

проблема — до 1 ч.,
стандартные работы — до 8 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

24/7 SLA на аварии — 1 ч.

24/7 SLA на аварии — 30 мин

Автоматические Health Check

Автоматические Health Check с рекомендациями от DBA

Индивидуальная проверка ваших БД нашими DBA

Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально.
Настоящее предложение не является публичной офертой.*Возможно платное увеличение лимита часов на базовые работы.**При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу.
По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу.

Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.

select case when setting::bigint < 90600 then 'Вы используете старую версию PostgreSQL, которая более не поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию как можно скорее.' when setting::bigint < 100000 then 'Вы используете старую версию PostgreSQL, которая пока что поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию.' when setting::bigint < 110000 then 'Вы используете достаточно современную версию PostgreSQL, которая активно поддерживается сообществом.'||chr(10)|| 'У вас все неплохо, но можно обновиться и на последнюю актуальную версию при возможности.' when setting::bigint < 140000 then 'Вы пользуетесь одной из самых последних версий PostgreSQL.'||chr(10)|| 'У вас все отлично.' else 'Вы используете версию которая находится в разработке,'||chr(10)|| 'если это production, то рекомендуем вам перейти на стабильную версию PostgreSQL.' end as "Проверка мажорной версии PostgreSQL" , case when setting::bigint between 130002 and 139999 or setting::bigint between 120006 and 129999 or setting::bigint between 110010 and 119999 or setting::bigint between 100015 and 109999 or setting::bigint between 90620 and 90699 then 'У вас стоит один из последних патчей PostgreSQL для вашей версии.'||chr(10)|| 'Похоже вы следите за обновлениями PostgreSQL. Это хороший факт.' else 'Похоже вы не обновляли PostgreSQL, после установки/последнего мажорного обновления, совсем.'||chr (10)|| 'Это плохо, рекомендуем вам обновиться до последней актуальной версии PostgreSQL.' end as "Проверка минорной версии PostgreSQL" , 'Актуальные версии на данный момент следующие, в порядке убывания актуальности:'||chr (10)|| '13.3, 12.7, 11.12, 10.17, 9.6.22' as "Список актуальных версий" from pg_settings where name = 'server_version_num';

SELECT now()-pg_postmaster_start_time() "Uptime", now()-stats_reset "Minutes since stats reset", round(100.0*checkpoints_req/checkpoints,1) "Forced checkpoint ratio (%)", round(min_since_reset/checkpoints,2) "Minutes between checkpoints", round(checkpoint_write_time::numeric/(checkpoints*1000),2) "Average write time per checkpoint (s)", round(checkpoint_sync_time::numeric/(checkpoints*1000),2) "Average sync time per checkpoint (s)", round(total_buffers/pages_per_mb,1) "Total MB written", round(buffers_checkpoint/(pages_per_mb*checkpoints),2) "MB per checkpoint", round(buffers_checkpoint/(pages_per_mb*min_since_reset*60),2) "Checkpoint MBps" FROM ( SELECT checkpoints_req, checkpoints_timed + checkpoints_req checkpoints, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_checkpoint + buffers_clean + buffers_backend total_buffers, stats_reset, round(extract('epoch' from now() - stats_reset)/60)::numeric min_since_reset, (1024.0 * 1024 / (current_setting('block_size')::numeric))pages_per_mb FROM pg_stat_bgwriter ) bg

Новости и Блог Назад