Re: Parallel heap vacuum

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2024-12-06 06:16:33
Message-ID: CAHut+PsA=9UOFKd52A41DSTgeUreMuuweWHmxsokqLzTMao=Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Sawada-San,

I started to look at patch v4-0001 in this thread.

It is quite a big patch so this is a WIP, and these below are just the
comments I have so far.

======
src/backend/access/heap/vacuumlazy.c

1.1.
+/*
+ * Relation statistics collected during heap scanning and need to be
shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelScanStats

The comment wording is not quite right.

/Relation statistics collected during heap scanning/Relation
statistics that are collected during heap scanning/

~~~

1.2
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared

The comment wording is not quite right.

/that need to be shared/that needs to be shared/

~~~

1.3.
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;

If 'pscandesc' is described as 'shared among parallel workers', should
that field be within 'PHVShared' instead?

~~~

1.4.
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ scan_stats = palloc(sizeof(LVRelScanStats));
+ scan_stats->scanned_pages = 0;
+ scan_stats->removed_pages = 0;
+ scan_stats->frozen_pages = 0;
+ scan_stats->lpdead_item_pages = 0;
+ scan_stats->missed_dead_pages = 0;
+ scan_stats->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ scan_stats->tuples_deleted = 0;
+ scan_stats->tuples_frozen = 0;
+ scan_stats->lpdead_items = 0;
+ scan_stats->live_tuples = 0;
+ scan_stats->recently_dead_tuples = 0;
+ scan_stats->missed_dead_tuples = 0;
+
+ vacrel->scan_stats = scan_stats;

1.4a.
Or, maybe just palloc0 this and provide a comment to say all counters
have been zapped to 0.

~

1.4b.
Maybe you don't need that 'scan_stats' variable; just assign the
palloc0 directly to the field instead.

~~~

1.5.
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */

The patch seems to have moved this "dead_items_alloc" comment to now
be below the "Allocate/initialize output statistics state" stuff (??).

======
src/backend/commands/vacuumparallel.c

parallel_vacuum_init:

1.6.
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;

The local vars and function params are named like this (here and in
other functions). But the struct field names say 'nworkers_for_XXX'
e.g.
shared->nworkers_for_table = nworkers_table;
shared->nworkers_for_index = nworkers_index;

It may be better to use these 'nworkers_for_table' and
'nworkers_for_index' names consistently everywhere.

~~~

parallel_vacuum_compute_workers:

1.7.
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;

The local variables 'parallel_workers_table' and
'parallel_workers_index; are hardly needed because AFAICT those
results can be directly assigned to *nworkers_table and
*nworkers_index.

~~~

parallel_vacuum_process_all_indexes:

1.8.
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);

I don't know if it is feasible or even makes sense to change, but
somehow it seemed strange that the 'num_index_scans' field is not
co-located with the 'num_table_scans' in the ParallelVacuumState. If
this is doable, then lots of functions also would no longer need to
pass 'num_index_scans' since they are already passing 'pvs'.

~~~

parallel_vacuum_table_scan_begin:

1.9.
+ (errmsg(ngettext("launched %d parallel vacuum worker for table
processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),

Isn't this the same as errmsg_plural?

~~~

1.10.

+/* Return the array of indexes associated to the given table to be vacuumed */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)

Even though the function comment can fit on one line it is nicer to
use a block-style comment with a period, like below. It then will be
consistent with other function comments (e.g.
parallel_vacuum_table_scan_end, parallel_vacuum_process_table, etc).
There are multiple places that this review comment can apply to.

(also typo /associated to/associated with/)

SUGGESTION
/*
* Return the array of indexes associated with the given table to be vacuumed.
*/

~~~

parallel_vacuum_get_nworkers_table:
parallel_vacuum_get_nworkers_index:

1.11.
+/* Return the number of workers for parallel table vacuum */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_table;
+}
+
+/* Return the number of workers for parallel index processing */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_index;
+}
+

Are these functions needed? AFAICT, they are called only from macros
where it seems just as easy to reference the pvs fields directly.

~~~

parallel_vacuum_process_table:

1.12.
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);

Maybe here also we should Assert(pvs.shared->do_vacuum_table_scan);

~~~

1.13.
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ ErrorContextCallback errcallback;
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ }

There are still some functions following this code (like
'shm_toc_lookup') that could potentially raise ERRORs. But, now the
error_context_stack is getting assigned/reset earlier than previously
was the case. Is that going to be a potential problem?

======
src/include/access/tableam.h

1.14.
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *

Maybe reword this comment to be more like table_parallelscan_estimate.

SUGGESTION
Estimate the size of shared memory that the parallel table vacuum needs for AM.

~~~

1.15.
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+

All of the "Callbacks for parallel table vacuum." had comments saying
"Not called if parallel table vacuum is disabled.". So, IIUC that
means all of these table_parallel_vacuum_XXX functions (other than the
compute_workers one) could have Assert(nworkers > 0); just to
double-check that is true.

~~~

table_paralle_vacuum_compute_workers:

1.16.
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}

Typo in function name. /paralle/parallel/

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-12-06 06:48:48 Re: Considering fractional paths in Append node
Previous Message Thomas Munro 2024-12-06 06:01:06 Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)