No UPDATE updates

Working with databases one can’t help but wonder what happens with data stored on the disk.
For example you need to know this to be able to save at least some of the information on damaged data blocks with help of hexedit.
Today we’ll try to find out how data types such as integer are kept on disk. To do that we will create a testing table, review its contents using hexedit and change data on disk. For the sake of this exercise I will be using PostgreSQL 14.

Let’s create a testing table:

create table t_int (n int);
insert into t_int values (351);
insert into t_int values (1000);
insert into t_int values (-351);

then extract the data:

select * from t_int;
n
------
351
1000
-351

That’s right, in this very order we insert the data in the first place.
Let’s find out where our table is stored:

SELECT pg_relation_filepath('t_int');
pg_relation_filepath
----------------------
base/14486/32905
show data_directory;
data_directory
------------------------
/var/lib/pgsql/14/data

Now let’s check what our file contains:

hexedit /var/lib/pgsql/14/data/base/14486/32905

Why does this happen? The problem is that dirty data pages haven’t been removed to the disk.

Let’s run checkpoint command:

postgres=# checkpoint;
CHECKPOINT

And once again check the contents of the file. This time there appears to be data in there:

Let’s find out what exactly is kept there. To do that I’ll split what we see into following sections:

A7 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  03 00 01 00  00 08 18 00  A1 FE FF FF  00 00 00 00
A6 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  02 00 01 00  00 08 18 00  E8 03 00 00  00 00 00 00
A5 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  01 00 01 00  00 08 18 00  5F 01 00 00  00 00 00 00

We see that there are repeated elements and a high amount of zeros. For easier read let’s remove them:

A7 03 00 00  03 00 01 00  00 08 18 00  A1 FE FF FF  00 00 00 00
A6 03 00 00  02 00 01 00  00 08 18 00  E8 03 00 00  00 00 00 00
A5 03 00 00  01 00 01 00  00 08 18 00  5F 01 00 00  00 00 00 00

If we take the last row and read it from the end here is what we see:

01 5F (hex) = 351 (dec)

We found where the first value is!
I wonder, what is under it?

03 E8 (hex) = 1000 (dec)

Cool! Now, once we found out the secret of type integer on the disk, let’s review the last value:

FF FF FE A1 = 4 294 966 945

What?! But there had to be -351?!!

Digging deeper.

From the documentation it becomes clear that int values are somewhere in the range of -2147483648 and +2147483647, so including 0 we get 4 294 967 296 values.

How much do you think 4 294 966 945 — 4 294 967 296?

-351

Bingo!
An alternative way to recognise the written value is the performing common operation for presentation of negative MSB (most significant bit).
We need to start with a positive number, write it in a binary system, invert and add 1.

4 294 966 945 = 11111111111111111111111010100001

Inverting and adding 1:

101011110+1 = 1 0101 1111 = 351 (dec)

.
We now understand how data is stored on the disk. Now let’s try, using this knowledge to change the value (of course you shouldn’t do this in a “live” database as you will get a database with corrupted data).

For example, let’s replace 351 with 888
We know that 351 is 01 5F, we replace this value using hexedit, to 888 (dec) = 78 03 (hex)
This way we will get following string (I removed the zeros)

A5 03 00 00 … 01 00 01 00 00 08 18 00 78 03 00 00 00 00 00 00

Let’s extract data from the table:

select * from t_int;
n
------
351
1000
-351

Why does this happen? To get the answer to this question we need EXPLAIN (analyze, buffers):

explain (analyze, buffers) select * from t_int;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on t_int (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.007 rows=3 loops=1)
Buffers: shared hit=1
Planning Time: 0.029 ms
Execution Time: 0.017 ms

As you can see, PostgreSQL reads one block from cache. Data in RAM is not aware that we changed data on disk.
Let’s restart Postgres and clear cache:

#!/bin/sh
# restart PostgreSQL
systemctl stop postgresql-14
systemctl start postgresql-14
sync; echo 3 > /proc/sys/vm/drop_caches

Now we can query our table again:

select * from t_int;
n
------
888
1000
-351

So we changed data on disk, queried the table, which returned outdated data. Seeing “Buffers: shared hit=1” gave us a hint that while performing the query PostgreSQL wasn’t reading data from disk, but rather from cache, which was located in RAM. To clear the data from cache I restarted Postgres and cleared the cache. Following this procedure I queried the data once again — the query returned updated data (since this time the cache was clear).

Hope this exercise was helpful and provided some insight on how the data is stored on the disk and how to go about investigating this storage.
Have you tried exploring this yourself? What have you learned? I’d be happy to hear from you in the comments below!

Комментарии: 2

2 комментария на «“No UPDATE updates”»

Аноним:

5 апреля, 2022 в 7:34 дп

Somebody needs to turn on data_checksums

Ответить
- Alexander Nikitin:
  
  5 апреля, 2022 в 9:19 дп
  
  Correct, you can certainly switch on checksums, this will lead to filling two bytes at the header of the page. Keep in mind, that we have access to data block and in the same way can change checksum using hexedit.
  
  Ответить

Добавить комментарий Отменить ответ

Базовый	Премиум	Enterprise
До 10 серверов	До 40 серверов	До 100 серверов
Чат, аварийный телефон	Чат, аварийный телефон	Чат, аварийный телефон
до 10 часов работы DBA/месяц*	до 25 часов работы DBA/месяц*	до 60 часов работы DBA/месяц*
SLA проблема — до 1 ч., стандартные работы — до 8 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.	SLA проблема — до 1 ч., стандартные работы — до 3 ч.
24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 1 ч.	24/7 SLA на аварии — 30 мин
Автоматические Health Check	Автоматические Health Check с рекомендациями от DBA	Индивидуальная проверка ваших БД нашими DBA
Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально. Настоящее предложение не является публичной офертой.Возможно платное увеличение лимита часов на базовые работы.При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу. По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу. Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.*

Базовый

Премиум

Enterprise

До 10 серверов

До 40 серверов

До 100 серверов

Чат, аварийный телефон

до 10 часов работы DBA/месяц*

до 25 часов работы DBA/месяц*

до 60 часов работы DBA/месяц*

SLA

проблема — до 1 ч.,
стандартные работы — до 8 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

SLA

проблема — до 1 ч.,
стандартные работы — до 3 ч.

24/7 SLA на аварии — 1 ч.

24/7 SLA на аварии — 30 мин

Автоматические Health Check

Автоматические Health Check с рекомендациями от DBA

Индивидуальная проверка ваших БД нашими DBA

Цена может варьировать в зависимости от индивидуальных требований клиента и обсуждается индивидуально.
Настоящее предложение не является публичной офертой.*Возможно платное увеличение лимита часов на базовые работы.**При выработке лимита часов, включенных в пакет, дополнительные часы оплачиваются по дополнительному тарифу.
По предварительной договоренности возможно увеличение лимита часов, включенных в пакет, по сниженному тарифу.

Указанные условия, включая стоимость оказываемых услуг в рублях РФ, могут быть изменены в зависимости от согласованных в дальнейшем существенных условий договора и предпочтительной для клиента валюты платежа. Минимальная длительность контракта — 6 месяцев.

select case when setting::bigint < 90600 then 'Вы используете старую версию PostgreSQL, которая более не поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию как можно скорее.' when setting::bigint < 100000 then 'Вы используете старую версию PostgreSQL, которая пока что поддерживается сообществом.'||chr(10)|| 'Рекомендуем вам перейти на последнюю актуальную версию.' when setting::bigint < 110000 then 'Вы используете достаточно современную версию PostgreSQL, которая активно поддерживается сообществом.'||chr(10)|| 'У вас все неплохо, но можно обновиться и на последнюю актуальную версию при возможности.' when setting::bigint < 140000 then 'Вы пользуетесь одной из самых последних версий PostgreSQL.'||chr(10)|| 'У вас все отлично.' else 'Вы используете версию которая находится в разработке,'||chr(10)|| 'если это production, то рекомендуем вам перейти на стабильную версию PostgreSQL.' end as "Проверка мажорной версии PostgreSQL" , case when setting::bigint between 130002 and 139999 or setting::bigint between 120006 and 129999 or setting::bigint between 110010 and 119999 or setting::bigint between 100015 and 109999 or setting::bigint between 90620 and 90699 then 'У вас стоит один из последних патчей PostgreSQL для вашей версии.'||chr(10)|| 'Похоже вы следите за обновлениями PostgreSQL. Это хороший факт.' else 'Похоже вы не обновляли PostgreSQL, после установки/последнего мажорного обновления, совсем.'||chr (10)|| 'Это плохо, рекомендуем вам обновиться до последней актуальной версии PostgreSQL.' end as "Проверка минорной версии PostgreSQL" , 'Актуальные версии на данный момент следующие, в порядке убывания актуальности:'||chr (10)|| '13.3, 12.7, 11.12, 10.17, 9.6.22' as "Список актуальных версий" from pg_settings where name = 'server_version_num';

SELECT now()-pg_postmaster_start_time() "Uptime", now()-stats_reset "Minutes since stats reset", round(100.0*checkpoints_req/checkpoints,1) "Forced checkpoint ratio (%)", round(min_since_reset/checkpoints,2) "Minutes between checkpoints", round(checkpoint_write_time::numeric/(checkpoints*1000),2) "Average write time per checkpoint (s)", round(checkpoint_sync_time::numeric/(checkpoints*1000),2) "Average sync time per checkpoint (s)", round(total_buffers/pages_per_mb,1) "Total MB written", round(buffers_checkpoint/(pages_per_mb*checkpoints),2) "MB per checkpoint", round(buffers_checkpoint/(pages_per_mb*min_since_reset*60),2) "Checkpoint MBps" FROM ( SELECT checkpoints_req, checkpoints_timed + checkpoints_req checkpoints, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_checkpoint + buffers_clean + buffers_backend total_buffers, stats_reset, round(extract('epoch' from now() - stats_reset)/60)::numeric min_since_reset, (1024.0 * 1024 / (current_setting('block_size')::numeric))pages_per_mb FROM pg_stat_bgwriter ) bg

Новости и Блог Назад