Multi-transactional statements and statistics for autovacuum

From: "Igor V(dot)Gnatyuk" <i(dot)gnatyuk(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Multi-transactional statements and statistics for autovacuum
Date: 2024-06-12 17:13:19
Message-ID: fc7e7eb4222ad1c8d14696b587b592f8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

Before the advent of procedures in PostgreSQL 11 that can manage
transactions, there could only be one transaction
in one statement. Hence the end of the transaction also meant the end of
the statement. Apparently, this is why
the corresponding restriction is described differently in different
places of the documentation:

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS
"...so a query or transaction still in progress does not affect the
displayed totals..."
"...counts actions taken so far within the current transaction..."

But now it's possible that several transactions are performed within one
SQL statement call.
At the same time, the current implementation transfers the accumulated
statistics to the shared memory only
at the end of the statement. These statistics data is used by automatic
vacuum. Thus, in a situation
where some procedure that changes data is running for a long time (e.g.
an infinite event processing loop,
including implementing any queues), the changes made and committed in it
will not affect statistics in shared memory
until the CALL statement is finished. This will not allow the autovacuum
to make the right cleaning decision in time.
To illustrate the described feature, I suggest to consider the example
below.

Example.

We process the data in the 'test' table. The 'changes' column will show
the number of row updates:

CREATE TABLE test (changes int);

Let's insert a row into the table:

INSERT INTO test VALUES (0);

At each processing step, the value of the 'changes' column will be
incremented. The processing will be performed
in a long-running loop within the 'process' procedure (see below). The
actions of each loop step are committed.

CREATE PROCEDURE process() AS $$
DECLARE
l_chs int;
BEGIN
LOOP
UPDATE test SET changes = changes + 1 RETURNING changes INTO
l_chs;
COMMIT;
RAISE NOTICE 'changes % -- upd_shared = %, upd_local = %', l_chs,
(SELECT n_tup_upd FROM pg_stat_all_tables
WHERE relname = 'test'), -- statistics in shared
memory (considered by autovacuum)
(SELECT n_tup_upd FROM pg_stat_xact_all_tables
WHERE relname = 'test'); -- statistics within the
operation (transaction)
END LOOP;
END
$$ LANGUAGE plpgsql

Let's call the procedure:

CALL process();

NOTICE: changes 1 -- upd_shared = 0, upd_local = 1
NOTICE: changes 2 -- upd_shared = 0, upd_local = 2
NOTICE: changes 3 -- upd_shared = 0, upd_local = 3
NOTICE: changes 4 -- upd_shared = 0, upd_local = 4
NOTICE: changes 5 -- upd_shared = 0, upd_local = 5
NOTICE: changes 6 -- upd_shared = 0, upd_local = 6
NOTICE: changes 7 -- upd_shared = 0, upd_local = 7
NOTICE: changes 8 -- upd_shared = 0, upd_local = 8
...

If we now observe the cumulative statistics on the 'test' table from
another session, we will see
that despite the fact that there are updates and dead tuples appear,
this information does not get into the shared memory:

SELECT n_tup_upd, n_dead_tup, n_ins_since_vacuum, vacuum_count,
autovacuum_count FROM pg_stat_all_tables WHERE relname = 'test'
| n_tup_upd | 0
| n_dead_tup | 0
| n_ins_since_vacuum | 1
| vacuum_count | 0
| autovacuum_count | 0

It would be logical to remove the existing restriction, that is, to
update statistics data precisely
after transaction completion, even if the operator is still working.

--
Regards, Igor Gnatyuk
Postgres Professional https://postgrespro.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2024-06-12 17:18:57 Re: Columnar format export in Postgres
Previous Message Alexander Korotkov 2024-06-12 17:11:55 Re: RFC: adding pytest as a supported test framework