Re: Vacuum statistics

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, a(dot)lepikhov(at)postgrespro(dot)ru
Subject: Re: Vacuum statistics
Date: 2025-01-13 11:19:46
Message-ID: 1e81a0a1-a63b-48fb-905a-d6495f89ab73@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, all! I updated the patches and I solved the problems with parallel
vacuum.

On 19.12.2024 13:37, Alena Rybakina wrote:
>
> Hi! I updated patch.
>
> On 02.12.2024 23:12, Alena Rybakina wrote:
>> On 02.12.2024 11:27, Alexander Korotkov wrote:
>>> Hi, Alena!
>>>
>>> On Wed, Nov 13, 2024 at 6:21 PM Alena Rybakina
>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>> Updated 0001-v13 attached, as well as the diff between v12 and v13.
>>>>
>>>> Thank you)
>>>>
>>>> And I agree with your changes. And included them in patches.
>>> Thank you for the updated patchset. Some points from me.
>>>
>>> * I've read the previous discussion on how important to keep all these
>>> fields regarding vacuum statistics including points by Andrei and Jim.
>>> It still worrying me that statistics volume is going to burst in about
>>> 3 times, but I don't have a particular proposal on how to make more
>>> granular approach. I wonder if you could propose something.
>
> I propose to solve this with a guc (track_vacuum_statistics) that will
> allow us to disable statistics collection or enable it back when
> needed. This statistics is useful if you collect it over a certain
> period of time and watch the dynamics of the change, so I think a hook
> will not hurt here.
>
> I also added new statistics vm_new_visible_frozen_pages due to the
> dc6acfd commit and renamed some of my statistics from frozen_pages to
> vm_new_frozen_pages and all-visible_pages to vm_new_visible_pages. I
> also added statistics missed_tupes, missed_pages. Both are necessary
> to take into account how many tuples were not cleared by vacuum due to
> failure to acquire a cleanup lock on a heap page. The second
> statistics missed_pages will allow you to track whether this is one
> particular page or not.
>
> This information can make it clear that perhaps the data is broken
> somewhere or there is an error in the operation of the database, for
> example.
>
I changed the purpose of the guc. By default, vacuum database statistics
are always collected, and the track_vacuum_statistics hook enables the
ability to collect extended statistics for a given database's relations.
This is done to achieve a balance between the allocated memory for
storing statistics and having the necessary monitoring at hand to track
the state of vacuum operation.
In addition to the fact that there are far fewer databases in a system
than relations, vacuum statistics for a database contain fewer
statistics than relations, but
they are enough to indicate that something may be wrong in the system
and prompt the administrator to enable extended monitoring for relations.

By default, the guc is disabled.

> I also noticed that my stats for indexes were not being collected
> while parallel vacuum was running. I fixed it by adding some extra
> code that basically captured changes to the
> parallel_vacuum_process_all_indexes function. I used a script like
> this to check if everything was correct.
>
> pgbench -d postgres -i -s 10
>
>
> my/inst/bin/pg_basebackup -D ~/backup
>
>
> #psql
>
> --check parallel vacuum statistics
>
> create index accounts_idx1 on pgbench_accounts(bid);
> create index accounts_idx2 on pgbench_accounts(aid, bid);
>
> delete from pgbench_accounts where aid >5;
> set max_parallel_maintenance_workers = 8;
> VACUUM (PARALLEL 3) pgbench_accounts;
>
>
> create index accounts_idx1 on pgbench_accounts(bid);
> create index accounts_idx2 on pgbench_accounts(aid, bid);
>
> delete from pgbench_accounts where aid >5;
> set max_parallel_maintenance_workers = 8;
> VACUUM (PARALLEL 3) pgbench_accounts;
>
>
> pg_ctl -D ../postgres_data11 -l logfile stop
>
> rm -rf ../postgres_data/*
>
> cp -r ~/backup/* ~/postgres_data/
>
> pg_ctl -D ../postgres_data11 -l logfile start
>
>
> --check vacuum statistics processed by postmaster only
>
> create index accounts_idx1 on pgbench_accounts(bid);
> create index accounts_idx2 on pgbench_accounts(aid, bid);
>
> delete from pgbench_accounts where aid >5;
> set max_parallel_maintenance_workers = 8;
> VACUUM (PARALLEL 0) pgbench_accounts;
>
> I noticed that rel_blks_read and rel_blks_hit are too small compared
> to the vacuum statistics when the vacuum is not parallel. I suspect
> that this problem is related to the fact that the relationship
> statistics have not reached that time. You can see that they are
> calculated in my patch like this:
>
> report->blks_fetched =
>     rel->pgstat_info->counts.blocks_fetched - counters->blocks_fetched;
>   report->blks_hit =
>     rel->pgstat_info->counts.blocks_hit - counters->blocks_hit;
>
>
> The second thing that bothered me was that some table stats differ in
> the fields total_blks_read, rel_blks_read, pages_removed. If with the
> buffer this could be related to the fact that in a single run we rely
> on the stats of the global buffer and shaft statistics and this could
> explain why there are more of them, then with pages_removed I have no
> explanation yet as to what could have happened. I am still studying this.
>
> When you have time, take a look at the patches, I will be glad to
> receive any feedback.
>
I fixed this problem. Now the statistics of parallel and non-parallel
vacuums are almost equal. See files output_vacuum_0_workers and
output_vacuum_3_workers. The results were obtained from the test that I
described above.
In fact, we need to add tracking statistics in the
parallel_vacuum_one_index. But I noticed another problem that I have
already fixed.

Vacuum statistics for indexes were accumulated in heap vacuum statistics
because of the complexity of vacuum operations,
namely, vacuum statistics for index cleaning were included in heap
relationship vacuum statistics.
Vacuum switches from cleaning the table to its indexes and back during
its operation, and we need to take this into account.
So, before cleaning indexes, we now save the collected vacuum statistics
for the heap, but we send it only after the processing is complete.

> To view statistics:
>
> select vt.relname, total_blks_read AS total_blks_read,
>   total_blks_hit AS total_blks_hit,
>   total_blks_dirtied AS total_blks_dirtied,
>   total_blks_written AS total_blks_written,
>
>   rel_blks_read AS rel_blks_read,
>   rel_blks_hit AS rel_blks_hit,
>
>   pages_deleted AS pages_deleted,
>   tuples_deleted AS tuples_deleted,
>
>   wal_records AS wal_records,
>   wal_fpi AS wal_fpi,
>   wal_bytes AS wal_bytes,
>
>   blk_read_time AS blk_read_time,
>   blk_write_time AS blk_write_time,
>
>   delay_time AS delay_time,
>   total_time AS total_time
>   FROM pg_stat_get_vacuum_indexes vt, pg_class c
> WHERE (vt.relname='accounts_idx1' or vt.relname='accounts_idx2' or
> vt.relname = 'pgbench_accounts_pkey') AND vt.relid = c.oid;
>
>
> select stats.relname,stats.total_blks_read AS total_blks_read,
>   stats.total_blks_hit AS total_blks_hit,
>   stats.total_blks_dirtied AS total_blks_dirtied,
>   stats.total_blks_written AS total_blks_written,
>
>   stats.rel_blks_read AS rel_blks_read,
>   stats.rel_blks_hit AS rel_blks_hit,
>
>   stats.pages_scanned AS pages_scanned,
>   stats.pages_removed AS pages_removed,
>   stats.pages_frozen AS pages_frozen,
>   stats.pages_all_visible AS pages_all_visible,
>   stats.tuples_deleted AS tuples_deleted,
>   stats.tuples_frozen AS tuples_frozen,
>   stats.dead_tuples AS dead_tuples,
>
>   stats.index_vacuum_count AS index_vacuum_count,
>   stats.wal_records AS wal_records,
>   stats.wal_fpi AS wal_fpi,
>   stats.wal_bytes AS wal_bytes,
>
>   stats.blk_read_time AS blk_read_time,
>   stats.blk_write_time AS blk_write_time,
>
>   stats.delay_time AS delay_time,
>   stats.total_time AS total_time from pg_stat_vacuum_tables stats,
> pg_stat_all_tables  WHERE stats.relname = 'pgbench_accounts' and
> stats.relid = pg_stat_all_tables.relid;
>
>
Just in case, I'll write that during the test I used simpler queries:

select * from pg_stat_vacuum_tables where relname like '%accounts%';

select * from pg_stat_vacuum_indexes where relname like '%accounts_%';

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
output_vacuum_0_workers text/plain 2.9 KB
output_parallel_3_workers text/plain 2.9 KB
v16-0001-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 71.2 KB
v16-0002-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 64.1 KB
v16-0003-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 36.4 KB
v16-0004-Add-documentation-about-the-system-views-that-are-us.patch text/x-patch 24.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2025-01-13 11:29:31 Re: Vacuum statistics
Previous Message Dean Rasheed 2025-01-13 11:12:23 Re: Psql meta-command conninfo+