Re: 回复:how to create index concurrently on partitioned table

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: 回复:how to create index concurrently on partitioned table
Date: 2020-09-01 05:51:58
Message-ID: 20200901055158.GH3511@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 09, 2020 at 06:44:23PM -0500, Justin Pryzby wrote:
> Thanks for looking.

The REINDEX patch is progressing its way, so I have looked a bit at
the part for CIC.

Visibly, the case of multiple partition layers is not handled
correctly. Here is a sequence that gets broken:
CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE TABLE child_0_10 PARTITION OF parent_tab
FOR VALUES FROM (0) TO (10);
CREATE TABLE child_10_20 PARTITION OF parent_tab
FOR VALUES FROM (10) TO (20);
CREATE TABLE child_20_30 PARTITION OF parent_tab
FOR VALUES FROM (20) TO (30);
INSERT INTO parent_tab VALUES (generate_series(0,29));
CREATE TABLE child_30_40 PARTITION OF parent_tab
FOR VALUES FROM (30) TO (40)
PARTITION BY RANGE(id);
CREATE TABLE child_30_35 PARTITION OF child_30_40
FOR VALUES FROM (30) TO (35);
CREATE TABLE child_35_40 PARTITION OF child_30_40
FOR VALUES FROM (35) TO (40);
INSERT INTO parent_tab VALUES (generate_series(30,39));
CREATE INDEX CONCURRENTLY parent_index ON parent_tab (id);

This fails as follows:
ERROR: XX000: unrecognized node type: 2139062143
LOCATION: copyObjectImpl, copyfuncs.c:5718

And the problem visibly comes from some handling with the second level
of partitions, child_30_40 in my example above. Even with that, this
outlines a rather severe problem in the patch: index_set_state_flags()
flips indisvalid on/off using a non-transactional update (see the use
of heap_inplace_update), meaning that if we fail in the middle of the
operation we may finish with a partition index tree where some of the
indexes are valid and some of them are invalid. In order to make this
logic consistent, I am afraid that we will need to do two things:
- Change index_set_state_flags() so as it uses a transactional
update. That's something I would like to change for other reasons,
like making sure that the REPLICA IDENTITY of a parent relation is
correctly reset when dropping a replica index.
- Make all the indexes of the partition tree valid in the *same*
sub-transaction.
You can note that this case is different than a concurrent REINDEX,
because in this case we just do an in-place change between the old and
new index, meaning that even if there is a failure happening while
processing, we may have some invalid indexes, but there are still
valid indexes attached to the partition tree, at any time.

+ MemoryContext oldcontext = MemoryContextSwitchTo(ind_context);
PartitionDesc partdesc = RelationGetPartitionDesc(rel);
int nparts = partdesc->nparts;
+ char *relname = pstrdup(RelationGetRelationName(rel));
Er, no. We should not play with the relation cache calls in a private
memory context. I think that this needs much more thoughts.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-01 05:56:35 Re: Use T_IntList for uint32
Previous Message Fujii Masao 2020-09-01 05:41:09 Re: Remove line length restriction in passwordFromFile()