Re: Vacuum statistics

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, a(dot)lepikhov(at)postgrespro(dot)ru, Sami Imseih <samimseih(at)gmail(dot)com>
Subject: Re: Vacuum statistics
Date: 2025-01-13 13:26:57
Message-ID: 0b4eefc7-4c38-4caa-b2ca-a4c75dd7dd12@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed that the cfbot is bad, the reason seems to be related to the
lack of a parameter in src/backend/utils/misc/postgresql.conf.sample. I
added it, it should help.

On 13.01.2025 14:19, Alena Rybakina wrote:
>
> Hi, all! I updated the patches and I solved the problems with parallel
> vacuum.
>
> On 19.12.2024 13:37, Alena Rybakina wrote:
>>
>> Hi! I updated patch.
>>
>> On 02.12.2024 23:12, Alena Rybakina wrote:
>>> On 02.12.2024 11:27, Alexander Korotkov wrote:
>>>> Hi, Alena!
>>>>
>>>> On Wed, Nov 13, 2024 at 6:21 PM Alena Rybakina
>>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>>> Updated 0001-v13 attached, as well as the diff between v12 and v13.
>>>>>
>>>>> Thank you)
>>>>>
>>>>> And I agree with your changes. And included them in patches.
>>>> Thank you for the updated patchset. Some points from me.
>>>>
>>>> * I've read the previous discussion on how important to keep all these
>>>> fields regarding vacuum statistics including points by Andrei and Jim.
>>>> It still worrying me that statistics volume is going to burst in about
>>>> 3 times, but I don't have a particular proposal on how to make more
>>>> granular approach. I wonder if you could propose something.
>>
>> I propose to solve this with a guc (track_vacuum_statistics) that
>> will allow us to disable statistics collection or enable it back when
>> needed. This statistics is useful if you collect it over a certain
>> period of time and watch the dynamics of the change, so I think a
>> hook will not hurt here.
>>
>> I also added new statistics vm_new_visible_frozen_pages due to the
>> dc6acfd commit and renamed some of my statistics from frozen_pages to
>> vm_new_frozen_pages and all-visible_pages to vm_new_visible_pages. I
>> also added statistics missed_tupes, missed_pages. Both are necessary
>> to take into account how many tuples were not cleared by vacuum due
>> to failure to acquire a cleanup lock on a heap page. The second
>> statistics missed_pages will allow you to track whether this is one
>> particular page or not.
>>
>> This information can make it clear that perhaps the data is broken
>> somewhere or there is an error in the operation of the database, for
>> example.
>>
> I changed the purpose of the guc. By default, vacuum database
> statistics are always collected, and the track_vacuum_statistics hook
> enables the ability to collect extended statistics for a given
> database's relations.
> This is done to achieve a balance between the allocated memory for
> storing statistics and having the necessary monitoring at hand to
> track the state of vacuum operation.
> In addition to the fact that there are far fewer databases in a system
> than relations, vacuum statistics for a database contain fewer
> statistics than relations, but
> they are enough to indicate that something may be wrong in the system
> and prompt the administrator to enable extended monitoring for relations.
>
> By default, the guc is disabled.
>
>> I also noticed that my stats for indexes were not being collected
>> while parallel vacuum was running. I fixed it by adding some extra
>> code that basically captured changes to the
>> parallel_vacuum_process_all_indexes function. I used a script like
>> this to check if everything was correct.
>>
>> pgbench -d postgres -i -s 10
>>
>>
>> my/inst/bin/pg_basebackup -D ~/backup
>>
>>
>> #psql
>>
>> --check parallel vacuum statistics
>>
>> create index accounts_idx1 on pgbench_accounts(bid);
>> create index accounts_idx2 on pgbench_accounts(aid, bid);
>>
>> delete from pgbench_accounts where aid >5;
>> set max_parallel_maintenance_workers = 8;
>> VACUUM (PARALLEL 3) pgbench_accounts;
>>
>>
>> create index accounts_idx1 on pgbench_accounts(bid);
>> create index accounts_idx2 on pgbench_accounts(aid, bid);
>>
>> delete from pgbench_accounts where aid >5;
>> set max_parallel_maintenance_workers = 8;
>> VACUUM (PARALLEL 3) pgbench_accounts;
>>
>>
>> pg_ctl -D ../postgres_data11 -l logfile stop
>>
>> rm -rf ../postgres_data/*
>>
>> cp -r ~/backup/* ~/postgres_data/
>>
>> pg_ctl -D ../postgres_data11 -l logfile start
>>
>>
>> --check vacuum statistics processed by postmaster only
>>
>> create index accounts_idx1 on pgbench_accounts(bid);
>> create index accounts_idx2 on pgbench_accounts(aid, bid);
>>
>> delete from pgbench_accounts where aid >5;
>> set max_parallel_maintenance_workers = 8;
>> VACUUM (PARALLEL 0) pgbench_accounts;
>>
>> I noticed that rel_blks_read and rel_blks_hit are too small compared
>> to the vacuum statistics when the vacuum is not parallel. I suspect
>> that this problem is related to the fact that the relationship
>> statistics have not reached that time. You can see that they are
>> calculated in my patch like this:
>>
>> report->blks_fetched =
>>     rel->pgstat_info->counts.blocks_fetched - counters->blocks_fetched;
>>   report->blks_hit =
>>     rel->pgstat_info->counts.blocks_hit - counters->blocks_hit;
>>
>>
>> The second thing that bothered me was that some table stats differ in
>> the fields total_blks_read, rel_blks_read, pages_removed. If with the
>> buffer this could be related to the fact that in a single run we rely
>> on the stats of the global buffer and shaft statistics and this could
>> explain why there are more of them, then with pages_removed I have no
>> explanation yet as to what could have happened. I am still studying this.
>>
>> When you have time, take a look at the patches, I will be glad to
>> receive any feedback.
>>
> I fixed this problem. Now the statistics of parallel and non-parallel
> vacuums are almost equal. See files output_vacuum_0_workers and
> output_vacuum_3_workers. The results were obtained from the test that
> I described above.
> In fact, we need to add tracking statistics in the
> parallel_vacuum_one_index. But I noticed another problem that I have
> already fixed.
>
> Vacuum statistics for indexes were accumulated in heap vacuum
> statistics because of the complexity of vacuum operations,
> namely, vacuum statistics for index cleaning were included in heap
> relationship vacuum statistics.
> Vacuum switches from cleaning the table to its indexes and back during
> its operation, and we need to take this into account.
> So, before cleaning indexes, we now save the collected vacuum
> statistics for the heap, but we send it only after the processing is
> complete.
>
>> To view statistics:
>>
>> select vt.relname, total_blks_read AS total_blks_read,
>>   total_blks_hit AS total_blks_hit,
>>   total_blks_dirtied AS total_blks_dirtied,
>>   total_blks_written AS total_blks_written,
>>
>>   rel_blks_read AS rel_blks_read,
>>   rel_blks_hit AS rel_blks_hit,
>>
>>   pages_deleted AS pages_deleted,
>>   tuples_deleted AS tuples_deleted,
>>
>>   wal_records AS wal_records,
>>   wal_fpi AS wal_fpi,
>>   wal_bytes AS wal_bytes,
>>
>>   blk_read_time AS blk_read_time,
>>   blk_write_time AS blk_write_time,
>>
>>   delay_time AS delay_time,
>>   total_time AS total_time
>>   FROM pg_stat_get_vacuum_indexes vt, pg_class c
>> WHERE (vt.relname='accounts_idx1' or vt.relname='accounts_idx2' or
>> vt.relname = 'pgbench_accounts_pkey') AND vt.relid = c.oid;
>>
>>
>> select stats.relname,stats.total_blks_read AS total_blks_read,
>>   stats.total_blks_hit AS total_blks_hit,
>>   stats.total_blks_dirtied AS total_blks_dirtied,
>>   stats.total_blks_written AS total_blks_written,
>>
>>   stats.rel_blks_read AS rel_blks_read,
>>   stats.rel_blks_hit AS rel_blks_hit,
>>
>>   stats.pages_scanned AS pages_scanned,
>>   stats.pages_removed AS pages_removed,
>>   stats.pages_frozen AS pages_frozen,
>>   stats.pages_all_visible AS pages_all_visible,
>>   stats.tuples_deleted AS tuples_deleted,
>>   stats.tuples_frozen AS tuples_frozen,
>>   stats.dead_tuples AS dead_tuples,
>>
>>   stats.index_vacuum_count AS index_vacuum_count,
>>   stats.wal_records AS wal_records,
>>   stats.wal_fpi AS wal_fpi,
>>   stats.wal_bytes AS wal_bytes,
>>
>>   stats.blk_read_time AS blk_read_time,
>>   stats.blk_write_time AS blk_write_time,
>>
>>   stats.delay_time AS delay_time,
>>   stats.total_time AS total_time from pg_stat_vacuum_tables stats,
>> pg_stat_all_tables  WHERE stats.relname = 'pgbench_accounts' and
>> stats.relid = pg_stat_all_tables.relid;
>>
>>
> Just in case, I'll write that during the test I used simpler queries:
>
> select * from pg_stat_vacuum_tables where relname like '%accounts%';
>
> select * from pg_stat_vacuum_indexes where relname like '%accounts_%';
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional

--
Regards,
Alena Rybakina
Postgres Professional

Attachment Content-Type Size
v17-0001-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 71.7 KB
v17-0002-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 64.1 KB
v17-0003-Machinery-for-grabbing-an-extended-vacuum-statistics.patch text/x-patch 36.4 KB
v17-0004-Add-documentation-about-the-system-views-that-are-us.patch text/x-patch 24.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2025-01-13 13:48:31 Re: why there is not VACUUM FULL CONCURRENTLY?
Previous Message Nisha Moond 2025-01-13 12:21:11 Re: Conflict detection for update_deleted in logical replication