Introduction to pg_stat_activity.
PostgreSQL RDBMS has a very informative built-in tools to track the state of the database. One such tool is
pg_stat_activity. This is a system view that allows to monitor the
databases processes in real time. This view is comparable to the system command
top, it is one of the first places from which database administrator can
start an investigation if some problem occurs. In this post I explain some useful examples how this view can be used to detect abnormal activity. To start, we need just some postgresql client, psql for example.
As a first step you can use an easy query:
# SELECT * FROM pg_stat_activity;
is shared among databases. pg_stat_activity
shows single line for the each database connection. There is a dedicated UNIX process for each connection. For one connection there are several attributes:
pid
— process ID which handles the connection. In the PostgreSQL terminology this process is called backend or worker.
datid, datname — ID and name of the database to which client connects.
usesysid, usename — ID and name which was used to connect.
client_addr, client_port, client_hostname
— network settings of the client, network address, port and hostname. application_name
— an arbitrary name that can be specified when client connects or a session varialble.
backend_start, xact_start, query_start, state_change
— timestamps, which indicate when the process (backend), transaction or query within a transaction has been started, and the last time the state of the process was changed.
state, waiting
— state of the process, and a flag which indicates if the process is waiting for another process.
backend_xid, backend_xmin — running transaction ID (
xid) and transaction ID (xmin
) which determines visibility scope for currently running transaction. query
— text of the query which currently runs by a worker or was recently executed in this connection.Since each line describe a single connection, we can easily find out how many client connections established to the database.
# SELECT count(*) FROM pg_stat_activity WHERE client_addr = ‘10.0.20.26’;
Of course, we can use different conditions and filter other fields, including and combining them with AND and OR.
Very important attributes are the transaction and query start time. Using the current time, we can calculate the duration of a transactions and queries. This is very useful for the long transaction detection.
# SELECT
client_addr, usename, datname,
clock_timestamp() — xact_start AS xact_age,
clock_timestamp() — query_start AS query_age,
state, query
FROM pg_stat_activity
ORDER BY coalesce(xact_start, query_start);
-[ RECORD 1 ]------------------------------------------------------- client_addr | 10.0.20.26 usename | james datname | sales_db xact_age | query_age | 12 days 05:52:09.181345 state | idle query | <query text>
As we can see in the output, we have a request which runs for 12 days. This is not an active transaction because xact_age field is empty. Using state field we can figure out the connection state — currently it’s
idle connection. Most likely James executed some query at Friday, not disconnected from the database and went on vacation for two weeks.
As mentioned above, in this example, we use a state field. Let us examine this in more details. The state field determines the current status of the connection and can be in one of several states:
active — this operating state of the process means that the process executes the query, that is, doing a useful work.
idle — is idling, no useful work is being done.
idle in transaction — is idling in an open transaction. This means that the application opened the transaction and does nothing anymore. Long running transactions (more than one hour) with such status are harmful to the
database and should be forced to close, and the causes of such behavior at the application level should be removed
idle in transaction (aborted) — is a broken transaction,at least one of the queries inside the transaction was failed, and other queries will be ignored, until transaction aborts.
fastpath function call — the backend is executing a fast-path function.
disabled — this is a dummy state, it is shown only if the
track_activities option is disabled.
Now we have an idea of the connection statuses, we can see the total activity in the database.
# SELECT
client_addr, usename, datname, state, count(*)
FROM pg_stat_activity GROUP BY 1, 2, 3, 4
ORDER BY 5 DESC;
client_addr | usename | datname | state | count
-------------+----------+----------+---------------------+-------
127.0.0.1 | app_user | sales_db | idle | 28
127.0.0.1 | app_user | sales_db | active | 15
127.0.0.1 | app_user | sales_db | idle in transaction | 3
127.0.0.1 | bg_user | sales_db | active | 6
10.11.2.12 | james | sales_db | idle | 2
10.0.20.26 | helen | shop_db | active | 1
Note, in example above the most of the connections are idle. If the total number of idle connections is dozens or several hundreds, then you definitely need to start thinking towards using
pgbouncer, to reduce the number of idle processes. We also can see
idle in transaction processes, it is important to monitor them and close them using pg_terminate_backend() if they hang too long. To determine the age of such connections is not a big problem, because we already
know how to do this.
# SELECT
client_addr, usename, datname,
clock_timestamp() — xact_start AS xact_age,
clock_timestamp() — query_start AS query_age, state, query
FROM pg_stat_activity
WHERE state = ‘idle in transaction’ ORDER BY
coalesce(xact_start,query_start);
client_addr | usename | datname | xact_age | query_age | state | query ------------+----------+----------+-----------------+-----------------+---------------------+------------- 127.0.0.1 | app_user | sales_db | 00:00:06.001629 | 00:00:00.002542 | idle in transaction | <query text> 127.0.0.1 | app_user | sales_db | 00:00:05.006710 | 00:00:00.003561 | idle in transaction | <query text> 127.0.0.1 | app_user | sales_db | 00:00:00.009004 | 00:00:00.001629 | idle in transaction | <query text>
As we can see the age of the transaction is relatively small, so we can filter the short queries. Also we filter idle connections, because they are not so interesting. Let’s filter out everything which is faster than 10 seconds.
# SELECT
client_addr, usename, datname,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
state, query
FROM pg_stat_activity
WHERE (
(now() - xact_start) > '00:00:10'::interval OR
(now() - query_start) > '00:00:10'::interval AND
state <> 'idle'
)
ORDER BY coalesce(xact_start, query_start);
client_addr | usename | datname | xact_age | query_age | state | query
------------+----------+----------+-----------------+-----------------+---------------------+-------------
127.0.0.1 | app_user | sales_db | 00:00:12.013319 | 00:00:05.002151 | active | <query text>
127.0.0.1 | app_user | sales_db | 00:00:10.083718 | 00:00:10.083718 | idle in transaction | <query text>
But that’s not the end, there can be situations when a query is blocked by another request or transaction. To identify such connections the waiting attribute can be used (true if disabled, false if no locks).
# SELECT
client_addr, usename, datname,
now() — xact_start AS xact_age,
now() — query_start AS query_age,
state, waiting, query
FROM pg_stat_activity
WHERE waiting
ORDER BY coalesce(xact_start, query_start);
client_addr | usename | datname | xact_age | query_age | state | waiting | query
------------+----------+----------+-----------------+-----------------+--------+---------+-------------
127.0.0.1 | app_user | sales_db | 00:00:16.736127 | 00:00:02.839100 | active | t | <query text>
Presence of waiting processes is a bad sign, it is usually an evidence of poor designed application. Locking occurs in a situation where two or more concurrent transactions (or queries) try to access the same resource, such as a set
of rows of a table. A simple example, transaction A update set of rows M, transaction B when tries to update the same set of rows and will wait until the transaction A will be either commited or aborted.
Besides pg_stat_activity, such situations can be tracked down in the postgresql log, in case when
log_lock_waits is enabled.
[UPDATE waiting] LOG: process 29054 still waiting for ShareLock on transaction 2034635 after 1000.160 ms
[UPDATE waiting] DETAIL: Process holding the lock: 29030. Wait queue: 29054.
[UPDATE waiting] CONTEXT: while updating tuple (0,68) in relation «products»
[UPDATE waiting] STATEMENT: update products set price = 20 where id = 1;
[UPDATE waiting] LOG: process 29054 acquired ShareLock on transaction 2034635 after 9049.000 ms
[UPDATE waiting] CONTEXT: while updating tuple (0,68) in relation «products»
[UPDATE waiting] STATEMENT: update products set price = 20 where id = 1;
What’s going on here?
- The process with PID 29054 is blocked and is waiting. Process with PID 29030 holds the lock.
- The text of the waiting query was logged too.
- After the 9 seconds through the process with PID 29054 obtain resources and was able to complete the query.
A deadlock situation is also possible. That happens when to commit some transaction PostgreSQL needs to acquire a lock on the resource, currently blocked by another transaction, which can release the lock only after obtaining
another lock on the resource currently blocked by the first transaction. When this situation occurs, postgres deadlock detection mechanism terminates one of the transactions and that allows another transactions to proceed. Default
deadlock timeout is 1 second and can be configured through
deadlock_timeout. Such situations are also logged in the postgresql log.
[UPDATE] ERROR: deadlock detected
[UPDATE] DETAIL: Process 29054 waits for ShareLock on transaction 2034637; blocked by process 29030.
Process 29030 waits for ShareLock on transaction 2034638; blocked by process 29054.
Process 29054: update products set price = 20 where id = 1;
Process 29030: update products set price = 20 where id = 2;
[UPDATE] HINT: See server log for query details.
[UPDATE] CONTEXT: while updating tuple (0,68) in relation «products»
[UPDATE] STATEMENT: update products set price = 20 where id = 1;
[UPDATE waiting] LOG: process 29030 acquired ShareLock on transaction 2034638 after 2924.914 ms
[UPDATE waiting] CONTEXT: while updating tuple (0,69) in relation «products»
[UPDATE waiting] STATEMENT: update products set price = 20 where id = 2;
what was happened here?
- The process with PID 29054 waits, as it was blocked by a process with PID 29030.
- In turn, the process with PID 29030 waits too, as it was blocked by a process with PID 29054.
- We can see further queries because of which deadlock was occured.
The process with PID 29030 was unlocked, so query within the process 29054 has been forced to cancel and transaction moved to «idle in transaction (aborted)» state.
Anyway, the waiting processes and deadlocks are bad, such incidents must be investigated and must be eliminated. Maybe someone will ask: Why they are so bad? I will answer, while the process is blocked, the application which sent
the query also waits, in this case the end user may think that the system slows down and can be frustrated. Nobody likes long response time.
To wrap it up, now you know how to use pg_stat_activity effectively, and can create your own view. It will be the first mate when something in the database goes out of control. Here is an example of a view which shows
abnormal activity in the database.
# CREATE VIEW db_activity AS SELECT pid, client_addr, client_port, datname, usename, clock_timestamp() — pg_stat_activity.xact_start AS ts_age, clock_timestamp() — pg_stat_activity.query_start AS query_age, clock_timestamp() —
pg_stat_activity.state_change AS change_age, waiting, state, query FROM pg_stat_activity WHERE (
OR
ORDER BY COALESCE (xact_start, query_start);
That is all. Thanks!