Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2025-01-14 19:51:05
Message-ID: CAAKRu_brbq5ufHU0gAiJfUyOBw7nCNt4Di=fzUUE0Pzk7hr8eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2025 at 5:37 PM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>
> Thank you for working on this patch, without this explanation it is difficult to understand what is happening, to put it mildly.

Thanks for the review! I've incorporated most of them into attached v7.

> The first of them is related to the fact that vacuum will not clean tuples referenced in indexes, since it was previously unable to take a cleanup lock on the index. You can look at the increment of missed_dead_tuples and vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are absolutely dead tuples for vacuum that it simply could not clean.

I had mentioned that if a (non-aggressive) vacuum cannot get a cleanup
lock on a page, it will skip pruning and freezing. I have expanded the
note to mention that this means it will not remove those dead tuples
or index entries.

> Secondly, I think it is worth mentioning the moment when vacuum urgently starts cleaning the heap relationship when there is a threat of a wraparound round. At this point, it skips the index processing phase and heap relationship truncation.

I've added failsafe to the list of reasons why we might skip phase II and III.

> Thirdly, FreeSpaceMap is updated every time after the complete completion of index and table cleaning (after the lazy_vacuum function) and after table heap pruning stage (the lazy_scan_prune function). Maybe you should add it.

I've added a sentence about this. It looks a bit awkward by itself,
but it doesn't really go with the other paragraphs. Anyway, I think it
is probably fine.

> I think it is possible to add additional information about parallel vacuum - firstly, workers are generated for each index, which perform their cleaning. Some indexes are defined by vacuum as unsafe for processing by a parallel worker and can be processed only by a postmaster (or leader). These are indexes that do not support parallel bulk-deletion, parallel cleanup (see parallel_vacuum_index_is_parallel_safe function).

I hesitated to add too much about parallel index vacuuming to
vacuumlazy.c. I have added a line which mentions that manual vacuums
may vacuum indexes in parallel and to look at vacuumparallel.c for
more info.

> I noticed an interesting point, but I don’t know if it is necessary to write about it, but for me it was not obvious and informative that the buffer and wal statistics are thrown by the indexes that were processed by workers and are thrown separately in (pvs->buffer_usage, pvs->wal_usage).

This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.

Thanks again for your close reading and detailed thoughts!

- Melanie

Attachment Content-Type Size
v7-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patch text/x-patch 39.3 KB
v7-0001-Add-more-general-summary-to-vacuumlazy.c.patch text/x-patch 4.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-14 19:52:29 Re: pgbench error: (setshell) of script 0; execution of meta-command failed
Previous Message Noah Misch 2025-01-14 19:48:28 Re: Issue with markers in isolation tester? Or not?