Deep dive into postgres stats: pg_stat_activity and pg_locks
Today, I would like to discuss another handy stats views — pg_stat_activity. This is really useful and powerful view the value of which is hard to measure. In short, the main goal of pg_stat_activity is to show the current activity in Postgres. Hence, if you would like to know what is going on in the database at any given moment, pg_stat_activity is the first place to check. Having worked with Postgres for several years now I’ve accumulated many useful queries. The majority of which I’ve already mentioned in the first article in these series. Though that post is two years old now, I’d recommend reading it to those who might have missed it. There, I talk about basic examples related to pg_stat_activity usage, such as checking current client activity, viewing clients’ states, checking transactions and queries ages, etc. Of course, since then PostgreSQL moved on and pg_stat_activity was slightly changed in the newer versions. In today’s post I am going to cover new aspects and some advanced techniques related to pg_stat_activity and another useful view — pg_locks.
Wait events is the new feature that has been introduced in the newer versions of PostgreSQL. In older versions you were able to check states of connected clients and get a little idea about what they actually do. The most curious state and, at the same time, the most unwanted one is waiting state. Pg_stat_activity has boolean column named “waiting” where “true” value means that the client is currently in a waiting state and is not performing. When waiting clients are detected, the next step is to check another view — pg_locks that helps to locate locked and locking clients. That’s pretty much it, PostgreSQL doesn’t provide any other ways to investigate the reason behind the waiting client, especially in cases when clients are waiting on events not related to inter-transactional communications. Wait events, that would allow us to get more information, were introduced in 9.6 where waiting column was replaced by two new columns: wait_event_type and wait_event. First column describes a category and second — a particular reason for which client is waiting for.
Thus, since 9.6 pg_stat_activity was extended and DBAs get advanced interface for understanding reasons behind waiting clients. Full list of the available event types and events can be obtained from the official documentation. Moreover, list of supported wait events has been extended in the upcoming PostgreSQL 10 where pg_stat_activity has become more informative. From DBA’s point of view, the usage of the new columns is quite simple, you just need to add them into your favourite pg_stat_activity’s query. In my example, I have concatenated these columns into one and added column which is based on the wait_event_type. The “waiting” column is repeating the original column that has been now removed from the previous version and shows the status in cases when client is blocked by another one. In addition, here is some more useful information about client and its transaction and query.
SELECT (clock_timestamp() - xact_start) AS xact_age,
(clock_timestamp() - query_start) AS query_age,
(clock_timestamp() - state_change) AS change_age,
pid, state, datname, usename,
coalesce(wait_event_type = 'Lock', 'f') AS waiting,
wait_event_type ||'.'|| wait_event as wait_details,
client_addr ||'.'|| client_port AS client,
query
FROM pg_stat_activity
WHERE clock_timestamp() - coalesce(xact_start, query_start) > '00:00:00.1'::interval
AND pid <> pg_backend_pid() AND state <> 'idle'
ORDER BY coalesce(xact_start, query_start);
When we discuss “waitings”, it’s worth mentioning pg_locks view. This view is not used very often and mainly designed for those situations when queries start competing for resources and lock neighbouring queries and transactions. Pg_locks view is unfortunately not extended to the wait events and only shows heavyweight locks that protect various database objects such as rows and tables, still pg_locks view, in my opinion, is very informative part of PostgreSQL stats.
Another option is using pg_locks in combination with pg_stat_activity. Jointly using pg_stat_activity and pg_locks allows us to get additional information about locked and locking clients, their ages and executed queries. Below is one of the queries that helps me in dark days. I recommend to run in expanded mode and examine first 5 records which shows the longest queries.
SELECT
COALESCE(l1.relation::regclass::text,l1.locktype) as locked_item,
w.wait_event_type as waiting_ev_type, w.wait_event as waiting_ev, w.query as waiting_query,
l1.mode as waiting_mode,
(select now() - xact_start as waiting_xact_duration from pg_stat_activity where pid = w.pid),
(select now() - query_start as waiting_query_duration from pg_stat_activity where pid = w.pid),
w.pid as waiting_pid, w.usename as waiting_user, w.state as waiting_state,
l.wait_event_type as locking_ev_type, l.wait_event_type as locking_ev, l.query as locking_query,
l2.mode as locking_mode,
(select now() - xact_start as locking_xact_duration from pg_stat_activity where pid = l.pid),
(select now() - query_start as locking_query_duration from pg_stat_activity where pid = l.pid),
l.pid as locking_pid, l.usename as locking_user, l.state as locking_state
FROM pg_stat_activity w
JOIN pg_locks l1 ON w.pid = l1.pid AND NOT l1.granted
JOIN pg_locks l2 ON (l1.transactionid = l2.transactionid AND l1.pid != l2.pid)
OR (l1.database = l2.database AND l1.relation = l2.relation and l1.pid != l2.pid)
JOIN pg_stat_activity l ON l2.pid = l.pid
WHERE w.wait_event is not null and w.wait_event_type is not null
ORDER BY l.query_start,w.query_start;
Another example of pg_locks and pg_stat_activity joint usage is lock tree which is extremely useful when postgres is drowning in the waiting clients. It has been written by my colleague, Victor Yegorov who has been inspired by and example from ORACLE. When dozens of clients are in a waiting state, it seems that there is no other way to pg_terminate’em all. But using this query, it is possible to quickly find out the root cause of the occurred waitings and make one single «kill» to resolve the issue at once.
WITH RECURSIVE l AS (
SELECT pid, locktype, mode, granted,
ROW(locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid) obj
FROM pg_locks
), pairs AS (
SELECT w.pid waiter, l.pid locker, l.obj, l.mode
FROM l w
JOIN l ON l.obj IS NOT DISTINCT FROM w.obj AND l.locktype=w.locktype AND NOT l.pid=w.pid AND l.granted
WHERE NOT w.granted
), tree AS (
SELECT l.locker pid, l.locker root, NULL::record obj, NULL AS mode, 0 lvl, locker::text path, array_agg(l.locker) OVER () all_pids
FROM ( SELECT DISTINCT locker FROM pairs l WHERE NOT EXISTS (SELECT 1 FROM pairs WHERE waiter=l.locker) ) l
UNION ALL
SELECT w.waiter pid, tree.root, w.obj, w.mode, tree.lvl+1, tree.path||'.'||w.waiter, all_pids || array_agg(w.waiter) OVER ()
FROM tree JOIN pairs w ON tree.pid=w.locker AND NOT w.waiter = ANY ( all_pids )
)
SELECT (clock_timestamp() - a.xact_start)::interval(3) AS ts_age,
replace(a.state, 'idle in transaction', 'idletx') state,
(clock_timestamp() - state_change)::interval(3) AS change_age,
a.datname,tree.pid,a.usename,a.client_addr,lvl,
(SELECT count(*) FROM tree p WHERE p.path ~ ('^'||tree.path) AND NOT p.path=tree.path) blocked,
repeat(' .', lvl)||' '||left(regexp_replace(query, 's+', ' ', 'g'),100) query
FROM tree
JOIN pg_stat_activity a USING (pid)
ORDER BY path;
To conclude, pg_stat_activity and pg_locks are super helpful and it’s great to see that they are getting new features. Of course, I hope that pg_locks will also be extended with wait events as well as pg_stat_activity.
Hope this was helpful to you and if you have any questions please comment below.