Re: REINDEX not updating partition progress

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: REINDEX not updating partition progress
Date: 2024-07-19 03:17:35
Message-ID: Zpnaz_gHMyf0xsso@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2024 at 11:07:49PM +0100, Ilya Gladyshev wrote:
> While working on CIC for partitioned tables [1], I noticed that REINDEX for
> partitioned tables is not tracking keeping progress of partitioned tables,
> so I'm creating a separate thread for this fix as suggested.

This limitation is not a bug, because we already document that
partitions_total and partitions_done are 0 during a REINDEX. Compared
to CREATE INDEX, a REINDEX is a bit wilder as it would work on all the
indexes for all the partitions, providing this information makes
sense.

Agreed that this could be better, but what's now on HEAD is not wrong
either.

> The way REINDEX for partitioned tables works now ReindexMultipleInternal
> treats every partition as an independent table without keeping track of
> partition_done/total counts, because this is just generic code for
> processing multiple tables. The patch addresses that by passing down the
> knowledge a flag to distinguish partitions from independent tables
> in ReindexMultipleInternal and its callees. I also noticed that the
> partitioned CREATE INDEX progress tracking could also benefit from
> progress_index_partition_done function that zeroizes params in addition to
> incrementing the counter, so I applied it there as well.

Still it does not do it because we know that the fields are going to
be overwritten pretty quickly anyway, no? For now, we could just do
without progress_index_partition_done(), I guess, keeping it to the
incrementation of PROGRESS_CREATEIDX_PARTITIONS_DONE. There's an
argument that this makes the code slightly easier to follow, with less
wrappers around the progress reporting.

+ int progress_params[3] = {
+ PROGRESS_CREATEIDX_COMMAND,
+ PROGRESS_CREATEIDX_PHASE,
+ PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+ };
+ int64 progress_values[3];
+ Oid heapId = relid;

Rather than having new code blocks, let's use a style consistent with
DefineIndex() where we have the pgstat_progress_update_multi_param(),
with a single {} block.

Adding the counter increment at the end of the loop in
ReindexMultipleInternal() is a good idea. It considers both the
concurrent and non-concurrent cases.

+ progress_values[2] = list_length(partitions);
+ pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId);

Hmm. Is setting the relid only once for pg_stat_progress_create_index
the best choice there is? Could it be better to report the partition
OID instead? Let's remember that indexes may have been attached with
names inconsistent with the partitioned table's index. It is a bit
confusing to stick to the relid all the partitioned table all the
time, for all the indexes of all the partitions reindexed. Could it
be better to actually introduce an entirely new field to the progress
table? What I would look for is more information:
1) the partitioned table OID on which the REINDEX runs
2) the partition table being processed
3) the index OID being processed (old and new for concurrent case).

The patch sets 1) to the OID of the partitioned table, lacks 2) and
sets 3) each time an index is rebuilt.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-07-19 03:47:02 Re: Expand applicability of aggregate's sortop optimization
Previous Message Tatsuo Ishii 2024-07-19 03:06:22 Re: documentation structure