Deep dive into postgres stats: pg_stat_activity and pg_locks

DBA’s best friends: pg_stat_activity and pg_locks

Today, I would like to discuss another handy stats views — pg_stat_activity. This is really useful and powerful view the value of which is hard to measure. In short, the main goal of pg_stat_activity is to show the current activity in Postgres. Hence, if you would like to know what is going on in the database at any given moment, pg_stat_activity is the first place to check. Having worked with Postgres for several years now I’ve accumulated many useful queries. The majority of which I’ve already mentioned in the first article in these series. Though that post is two years old now, I’d recommend reading it to those who might have missed it. There, I talk about basic examples related to pg_stat_activity usage, such as checking current client activity, viewing clients’ states, checking transactions and queries ages, etc. Of course, since then PostgreSQL moved on and pg_stat_activity was slightly changed in the newer versions. In today’s post I am going to cover new aspects and some advanced techniques related to pg_stat_activity and another useful view — pg_locks.

Wait events is the new feature that has been introduced in the newer versions of PostgreSQL. In older versions you were able to check states of connected clients and get a little idea about what they actually do. The most curious state and, at the same time, the most unwanted one is waiting state. Pg_stat_activity has boolean column named “waiting” where “true” value means that the client is currently in a waiting state and is not performing. When waiting clients are detected, the next step is to check another view — pg_locks that helps to locate locked and locking clients. That’s pretty much it, PostgreSQL doesn’t provide any other ways to investigate the reason behind the waiting client, especially in cases when clients are waiting on events not related to inter-transactional communications. Wait events, that would allow us to get more information, were introduced in 9.6 where waiting column was replaced by two new columns: wait_event_type and wait_event. First column describes a category and second — a particular reason for which client is waiting for.

Thus, since 9.6 pg_stat_activity was extended and DBAs get advanced interface for understanding reasons behind waiting clients. Full list of the available event types and events can be obtained from the official documentation. Moreover, list of supported wait events has been extended in the upcoming PostgreSQL 10 where pg_stat_activity has become more informative. From DBA’s point of view, the usage of the new columns is quite simple, you just need to add them into your favourite pg_stat_activity’s query. In my example, I have concatenated these columns into one and added column which is based on the wait_event_type. The “waiting” column is repeating the original column that has been now removed from the previous version and shows the status in cases when client is blocked by another one. In addition, here is some more useful information about client and its transaction and query.

SELECT (clock_timestamp() - xact_start) AS xact_age,

       (clock_timestamp() - query_start) AS query_age,

       (clock_timestamp() - state_change) AS change_age,

       pid, state, datname, usename,

       coalesce(wait_event_type = 'Lock', 'f') AS waiting,

       wait_event_type ||'.'|| wait_event as wait_details,

       client_addr ||'.'|| client_port AS client,

       query

FROM pg_stat_activity

WHERE clock_timestamp() - coalesce(xact_start, query_start) > '00:00:00.1'::interval

AND pid <> pg_backend_pid() AND state <> 'idle'

ORDER BY coalesce(xact_start, query_start);

When we discuss “waitings”, it’s worth mentioning pg_locks view. This view is not used very often and mainly designed for those situations when queries start competing for resources and lock neighbouring queries and transactions. Pg_locks view is unfortunately not extended to the wait events and only shows heavyweight locks that protect various database objects such as rows and tables, still pg_locks view, in my opinion, is very informative part of PostgreSQL stats.

Another option is using pg_locks in combination with pg_stat_activity. Jointly using pg_stat_activity and pg_locks allows us to get additional information about locked and locking clients, their ages and executed queries. Below is one of the queries that helps me in dark days. I recommend to run in expanded mode and examine first 5 records which shows the longest queries.

SELECT

  COALESCE(l1.relation::regclass::text,l1.locktype) as locked_item,

  w.wait_event_type as waiting_ev_type, w.wait_event as waiting_ev, w.query as waiting_query,

  l1.mode as waiting_mode,

  (select now() - xact_start as waiting_xact_duration from pg_stat_activity where pid = w.pid),

  (select now() - query_start as waiting_query_duration from pg_stat_activity where pid = w.pid),

  w.pid as waiting_pid, w.usename as waiting_user, w.state as waiting_state,

  l.wait_event_type as locking_ev_type, l.wait_event_type as locking_ev, l.query as locking_query,

  l2.mode as locking_mode,

  (select now() - xact_start as locking_xact_duration from pg_stat_activity where pid = l.pid),

  (select now() - query_start as locking_query_duration from pg_stat_activity where pid = l.pid),

  l.pid as locking_pid, l.usename as locking_user, l.state as locking_state

FROM pg_stat_activity w

JOIN pg_locks l1 ON w.pid = l1.pid AND NOT l1.granted

JOIN pg_locks l2 ON (l1.transactionid = l2.transactionid AND l1.pid != l2.pid)

    OR (l1.database = l2.database AND l1.relation = l2.relation and l1.pid != l2.pid)

JOIN pg_stat_activity l ON l2.pid = l.pid

WHERE w.wait_event is not null and w.wait_event_type is not null

ORDER BY l.query_start,w.query_start;

Another example of pg_locks and pg_stat_activity joint usage is lock tree which is extremely useful when postgres is drowning in the waiting clients. It has been written by my colleague, Victor Yegorov who has been inspired by and example from ORACLE. When dozens of clients are in a waiting state, it seems that there is no other way to pg_terminate’em all. But using this query, it is possible to quickly find out the root cause of the occurred waitings and make one single «kill» to resolve the issue at once.

WITH RECURSIVE l AS (

  SELECT pid, locktype, mode, granted,

 ROW(locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid) obj

  FROM pg_locks

), pairs AS (

  SELECT w.pid waiter, l.pid locker, l.obj, l.mode

  FROM l w

  JOIN l ON l.obj IS NOT DISTINCT FROM w.obj AND l.locktype=w.locktype AND NOT l.pid=w.pid AND l.granted

  WHERE NOT w.granted

), tree AS (

  SELECT l.locker pid, l.locker root, NULL::record obj, NULL AS mode, 0 lvl, locker::text path, array_agg(l.locker) OVER () all_pids

  FROM ( SELECT DISTINCT locker FROM pairs l WHERE NOT EXISTS (SELECT 1 FROM pairs WHERE waiter=l.locker) ) l

  UNION ALL

  SELECT w.waiter pid, tree.root, w.obj, w.mode, tree.lvl+1, tree.path||'.'||w.waiter, all_pids || array_agg(w.waiter) OVER ()

  FROM tree JOIN pairs w ON tree.pid=w.locker AND NOT w.waiter = ANY ( all_pids )

)

SELECT (clock_timestamp() - a.xact_start)::interval(3) AS ts_age,

       replace(a.state, 'idle in transaction', 'idletx') state,

       (clock_timestamp() - state_change)::interval(3) AS change_age,

       a.datname,tree.pid,a.usename,a.client_addr,lvl,

       (SELECT count(*) FROM tree p WHERE p.path ~ ('^'||tree.path) AND NOT p.path=tree.path) blocked,

       repeat(' .', lvl)||' '||left(regexp_replace(query, 's+', ' ', 'g'),100) query

FROM tree

JOIN pg_stat_activity a USING (pid)

ORDER BY path;

To conclude, pg_stat_activity and pg_locks are super helpful and it’s great to see that they are getting new features. Of course, I hope that pg_locks will also be extended with wait events as well as pg_stat_activity.

Hope this was helpful to you and if you have any questions please comment below.

Комментарии: 6

6 комментариев на «“Deep dive into postgres stats: pg_stat_activity and pg_locks”»

Unknown:

20 октября, 2017 в 4:01 пп

Hi,

Very good article Alexey !
The last example is fantastic!

Best regards,

Alexandre

Ответить
Alexey Lesovsky:

20 октября, 2017 в 6:05 пп

Thanks, Alexandre!

Ответить
alf delgado:

13 июля, 2020 в 7:46 пп

Hi!
Great examples, but the typographical marks (quotes and dashes, for example) make it impossible to copy and paste. Could you change the code snippets into something that can be clipped?

Ответить
- Valeria K:
  
  15 июля, 2020 в 10:21 дп
  
  Hi, alf delgado! Thanks for noticing! It seems that the platform we are using misplaced some of the characters. It has been corrected now. Thanks again.
  
  Ответить
Fabrice:

24 марта, 2022 в 12:26 пп

Best analysis ever

Ответить
- Venera Kasimova:
  
  25 марта, 2022 в 6:39 дп
  
  Thank you, Fabrice! )
  
  Ответить

Добавить комментарий Отменить ответ

Базовый	Премиум	Enterprise
До 10 серверов	До 40 серверов	До 100 серверов
Чат, аварийный телефон	Чат, аварийный телефон	Чат, аварийный телефон
до 10 часов работы DBA/месяц*	до 25 часов работы DBA/месяц*	до 60 часов работы DBA/месяц*
SLA проблема — до 1 ч., стандартные работы — до 8 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.
24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 30 мин
Автоматические Health Check	Автоматические Health Check с рекомендациями от DBA	Индивидуальная проверка ваших БД нашими DBA
Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально. Настоящее предложение не является публичной офертой.Возможно платное увеличение лимита часов на базовые работы.При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу. По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу. Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.*

Базовый

Премиум

Enterprise

До 10 серверов

До 40 серверов

До 100 серверов

Чат, аварийный телефон

до 10 часов работы DBA/месяц*

до 25 часов работы DBA/месяц*

до 60 часов работы DBA/месяц*

SLA

проблема — до 1 ч.,
стандартные работы — до 8 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

24/7 SLA на аварии — 1 ч.

24/7 SLA на аварии — 30 мин

Автоматические Health Check

Автоматические Health Check с рекомендациями от DBA

Индивидуальная проверка ваших БД нашими DBA

Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально.
Настоящее предложение не является публичной офертой.*Возможно платное увеличение лимита часов на базовые работы.**При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу.
По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу.

Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.

select case when setting::bigint < 90600 then 'Вы используете старую версию PostgreSQL, которая более не поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию как можно скорее.' when setting::bigint < 100000 then 'Вы используете старую версию PostgreSQL, которая пока что поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию.' when setting::bigint < 110000 then 'Вы используете достаточно современную версию PostgreSQL, которая активно поддерживается сообществом.'||chr(10)|| 'У вас все неплохо, но можно обновиться и на последнюю актуальную версию при возможности.' when setting::bigint < 140000 then 'Вы пользуетесь одной из самых последних версий PostgreSQL.'||chr(10)|| 'У вас все отлично.' else 'Вы используете версию которая находится в разработке,'||chr(10)|| 'если это production, то рекомендуем вам перейти на стабильную версию PostgreSQL.' end as "Проверка мажорной версии PostgreSQL" , case when setting::bigint between 130002 and 139999 or setting::bigint between 120006 and 129999 or setting::bigint between 110010 and 119999 or setting::bigint between 100015 and 109999 or setting::bigint between 90620 and 90699 then 'У вас стоит один из последних патчей PostgreSQL для вашей версии.'||chr(10)|| 'Похоже вы следите за обновлениями PostgreSQL. Это хороший факт.' else 'Похоже вы не обновляли PostgreSQL, после установки/последнего мажорного обновления, совсем.'||chr (10)|| 'Это плохо, рекомендуем вам обновиться до последней актуальной версии PostgreSQL.' end as "Проверка минорной версии PostgreSQL" , 'Актуальные версии на данный момент следующие, в порядке убывания актуальности:'||chr (10)|| '13.3, 12.7, 11.12, 10.17, 9.6.22' as "Список актуальных версий" from pg_settings where name = 'server_version_num';

SELECT now()-pg_postmaster_start_time() "Uptime", now()-stats_reset "Minutes since stats reset", round(100.0*checkpoints_req/checkpoints,1) "Forced checkpoint ratio (%)", round(min_since_reset/checkpoints,2) "Minutes between checkpoints", round(checkpoint_write_time::numeric/(checkpoints*1000),2) "Average write time per checkpoint (s)", round(checkpoint_sync_time::numeric/(checkpoints*1000),2) "Average sync time per checkpoint (s)", round(total_buffers/pages_per_mb,1) "Total MB written", round(buffers_checkpoint/(pages_per_mb*checkpoints),2) "MB per checkpoint", round(buffers_checkpoint/(pages_per_mb*min_since_reset*60),2) "Checkpoint MBps" FROM ( SELECT checkpoints_req, checkpoints_timed + checkpoints_req checkpoints, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_checkpoint + buffers_clean + buffers_backend total_buffers, stats_reset, round(extract('epoch' from now() - stats_reset)/60)::numeric min_since_reset, (1024.0 * 1024 / (current_setting('block_size')::numeric))pages_per_mb FROM pg_stat_bgwriter ) bg

Новости и Блог Назад