Re: Parallel heap vacuum

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2024-12-09 03:19:10
Message-ID: CAHut+PtnyLVkgg7BsfXy0ciVeyCBaXNRSSi0h8AVdx9cTL9_ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Sawada-San.

FWIW, here are the remainder of my review comments for the patch v4-0001

======
src/backend/access/heap/vacuumlazy.c

lazy_scan_heap:

2.1.

+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap with parallel workers.
+ */

/with/using/

~~~

2.2.
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);

The do_lazy_scan_heap() returns a boolean and according to that
function comment it should always be true if it is not using the
parallel heap scan. So should we get the function return value here
and Assert that it is true?

~~~

2.3.

Start uppercase even for all the single line comments for consistency
with exasiting code.

e.g.
+ /* report that everything is now scanned */

e.g
+ /* now we can compute the new value for pg_class.reltuples */

e.g.
+ /* report all blocks vacuumed */

~~~

heap_vac_scan_next_block_parallel:

2.4.
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)

The function comment should explain the return value.

~~~

2.5.
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;

Unnecessary whitespace.

~~~

dead_items_alloc:

2.6.
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes. Note
+ * that only one worker can be used for an index, we invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);

Is this information misplaced? Why describe here "only one worker" and
"at least two indexes on a table" I don't see anything here checking
those conditions.

~~~

heap_parallel_vacuum_compute_workers:

2.7.
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */

The "This probably needs to..." part maybe should have an "XXX" marker
in the comment which AFAIK is used to highlight current decisions and
potential for future changes.

~~~

heap_parallel_vacuum_initialize:

2.8.
There is inconsistent capitalization of the single-line comments in
this function. The same occurs in many functions in this file. but it
is just a bit more obvious in this one. Please see all the others too.

~~~

parallel_heap_complete_unfinised_scan:

2.9.
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)

TYPO in function name /unfinised/unfinished/

~~~

2.10.
+ if (!wstate->maybe_have_blocks)
+
+ continue;

Unnecessary blank line.

~~~

2.11.
+
+ /* Attache the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);

/Attache/Attach/

~~~

2.12.
+ /*
+ * We don't need to gather the scan statistics here because statistics
+ * have already been accumulated the leaders statistics directly.
+ */

"have already been accumulated the leaders" -- missing word there somewhere?

~~~

do_parallel_lazy_scan_heap:

2.13.
+ /*
+ * If the heap scan paused in the middle of the table due to full of
+ * dead_items TIDs, perform a round of index and heap vacuuming.
+ */
+ if (!scan_done)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_blkno should have already been updated when gathering
+ * statistics
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ break;

Instead of:

if (!scan_done)
{
<other code ...>
continue;
}

break;

Won't it be better to refactor like:

SUGGESTION
if (scan_done)
break;

<other code...>

~~~

2.14.
+ /*
+ * The parallel heap vacuum finished, but it's possible that some workers
+ * have allocated blocks but not processed yet. This can happen for
+ * example when workers exit because of full of dead_items TIDs and the
+ * leader process could launch fewer workers in the next cycle.
+ */

There seem to be some missing words:

e.g. /not processed yet./not processed them yet./
e.g. /because of full of dead_items/because they are full of dead_items/

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Srinath Reddy Sadipiralla 2024-12-09 04:44:06 Re: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?
Previous Message Andrei Lepikhov 2024-12-09 02:35:57 Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table