From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Subject: | CLUSTER on partitioned index |
Date: | 2020-10-28 00:33:12 |
Message-ID: | 20201028003312.GU9241@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Forking this thread, since the existing CFs have been closed.
https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd
On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:
> > Honestly, I think you're over-thinking and over-engineering indisclustered.
> >
> > If "clusteredness" was something we offered to maintain across DML, I think
> > that might be important to provide stronger guarantees. As it is now, I don't
> > think this patch is worth changing the catalog definition.
>
> Well, this use case is new because we are discussing the relationship
> of indisclustered across multiple transactions for multiple indexes,
> so I'd rather have this discussion than not, and I have learnt
> the hard way with REINDEX that we should care a lot about the
> consistency of partition trees at any step of the operation.
indisclustered is only used as a default for "CLUSTER" (without USING). The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.
> Let's
> imagine a simple example here, take this partition tree: p (parent),
> and two partitions p1 and p2. p has two partitioned indexes i and j,
> indexes also present in p1 and p2 as i1, i2, j1 and j2. Let's assume
> that the user has done a CLUSTER on p USING i that completes, meaning
> that i, i1 and i2 have indisclustered set. Now let's assume that the
> user does a CLUSTER on p USING j this time, and that this command
> fails while processing p2, meaning that indisclustered is set for j1,
> i2, and perhaps i or j depending on what the patch does.
I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.
I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.
Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.
--
Justin
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Implement-CLUSTER-of-partitioned-table.patch | text/x-diff | 16.8 KB |
v1-0002-preserve-indisclustered-on-children-of-clustered-.patch | text/x-diff | 2.9 KB |
v1-0003-Invalidate-indisclustered-when-attaching-uncluste.patch | text/x-diff | 8.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-10-28 00:44:32 | DROP INDEX CONCURRENTLY on partitioned index |
Previous Message | Tomas Vondra | 2020-10-27 23:56:23 | Re: Resetting spilled txn statistics in pg_stat_replication |