Re: REINDEX not updating partition progress

From: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: REINDEX not updating partition progress
Date: 2024-07-21 00:49:39
Message-ID: 779baf7f-6fc7-4431-a08e-eec63d84c6a3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 19 июля 2024 г., в 04:17, Michael Paquier <michael(at)paquier(dot)xyz>
> написал(а):
>
> On Fri, Jul 12, 2024 at 11:07:49PM +0100, Ilya Gladyshev wrote:
>> While working on CIC for partitioned tables [1], I noticed that
>> REINDEX for
>> partitioned tables is not tracking keeping progress of partitioned
>> tables,
>> so I'm creating a separate thread for this fix as suggested.
>
> This limitation is not a bug, because we already document that
> partitions_total and partitions_done are 0 during a REINDEX.  Compared
> to CREATE INDEX, a REINDEX is a bit wilder as it would work on all the
> indexes for all the partitions, providing this information makes
> sense.
>
> Agreed that this could be better, but what's now on HEAD is not wrong
> either.

Thanks for pointing it out, I didn’t notice it. You’re right, this not a
bug, but an improvement. Will remove the bits from the documentation as
well.

>
> Still it does not do it because we know that the fields are going to
> be overwritten pretty quickly anyway, no?  For now, we could just do
> without progress_index_partition_done(), I guess, keeping it to the
> incrementation of PROGRESS_CREATEIDX_PARTITIONS_DONE.  There's an
> argument that this makes the code slightly easier to follow, with less
> wrappers around the progress reporting.

The use-case that I thought this would improve is REINDEX CONCURRENT,
where data from later stages of the previous partitions would linger for
quite a while before it gets to the same stage of the current partition.
I don’t think this is of big importance, so I’m ok with making code
simpler and leaving it out.

>
> +intprogress_params[3] = {
> +PROGRESS_CREATEIDX_COMMAND,
> +PROGRESS_CREATEIDX_PHASE,
> +PROGRESS_CREATEIDX_PARTITIONS_TOTAL
> +};
> +int64progress_values[3];
> +OidheapId = relid;
>
> Rather than having new code blocks, let's use a style consistent with
> DefineIndex() where we have the pgstat_progress_update_multi_param(),
> with a single {} block.

Fixed.

> Adding the counter increment at the end of the loop in
> ReindexMultipleInternal() is a good idea.  It considers both the
> concurrent and non-concurrent cases.

Another reason to do so is that reindex_relation calls itself
recursively, so it'd require some additional mechanism so that it
wouldn't increment multiple times per partition.
>
> +   progress_values[2] = list_length(partitions);
> +   pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId);
>
> Hmm.  Is setting the relid only once for pg_stat_progress_create_index
> the best choice there is?  Could it be better to report the partition
> OID instead?

Technically, we could do this, but it looks like no other commands do it
and currently there’s no API to do it without erasing the rest of
progress information.

>  Let's remember that indexes may have been attached with
> names inconsistent with the partitioned table's index.  It is a bit
> confusing to stick to the relid all the partitioned table all the
> time, for all the indexes of all the partitions reindexed.

I agree that it’s a bit confusing especially because documentation gives
no hints about it. Judging by documentation, I would expect relid and
index_relid to correspond to the same table. However, if we say that
this field represents the oid of the relation on which the command was
invoked, then I think it does make sense. Edited documentation to say
that in the new patch.

>  Could it
> be better to actually introduce an entirely new field to the progress
> table?  What I would look for is more information:
> 1) the partitioned table OID on which the REINDEX runs
> 2) the partition table being processed
> 3) the index OID being processed (old and new for concurrent case).
>
> The patch sets 1) to the OID of the partitioned table, lacks 2) and
> sets 3) each time an index is rebuilt.
> —
> Michael

I like this approach more, but I’m not sure whether adding another field
for partition oid is worth it, since we already have index_relid that we
can use to join with pg_index to get that. On the other hand,
index_relid is missing during regular CREATE INDEX, so this new field
could be useful to indicate which table is being indexed in this case.
I'm on the fence about this, attached this as a separate patch, if you
think it's a good idea.

Thank you for the review!

Attachment Content-Type Size
v2-0002-partition_relid-column-for-create-index-progress.patch text/x-patch 5.7 KB
v2-0001-make-REINDEX-track-partition-progress.patch text/x-patch 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-21 01:43:35 Re: Add mention of execution time memory for enable_partitionwise_* GUCs
Previous Message Tomas Vondra 2024-07-20 23:32:46 Re: Report search_path value back to the client.