Re: Partitioning, Identity and Uniqueness (given pg 16 changes)

From: Darryl Green <darryl(dot)green(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Partitioning, Identity and Uniqueness (given pg 16 changes)
Date: 2024-02-19 09:07:44
Message-ID: CADJ=SENJx+Drmzs9hR6-z1FpmA8G0ak2kRa7cfxP7zvWYzKBGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 19 Feb 2024 at 14:23, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 19 Feb 2024 at 16:32, Darryl Green <darryl(dot)green(at)gmail(dot)com> wrote:
> > 2) It would be nice to be able to specify the id as pk on the table
being partitioned (as it was in the non-partitioned definition of the
table) once to document and enforce that the partitions simply inherit the
id pk. This would seem only to need the "partition by" validation to allow
a column not mentioned in partition by clause to be defined as pk or unique
if and only if the pk/unique column is an identity column. Not a big deal
but is this practical/valid?
>
> Unfortunately, it's not as easy as you think. It's not a matter of
> dropping the check that requires all PRIMARY KEY columns are present
> in the PARTITION BY clause. For this to work a *single* index (i.e.
> non-partitioned index) would have to index all partitions.

Now that would be bad. And I do understand that is exactly why in the
general case the existing rule is correct. You snipped my specific
usage/question which was about the uniqueness of a column that was
specified as "id int generated always as identity not null" and relying on
all partitions using the same sequence to populate to guarantee that there
are never(*) duplicates in other partitions.

> Otherwise,
> how would Postgres ensure that the value being inserted doesn't exist
> in some other partition?

It is, in practice, ensured by the fact that it is ensured that all
partitions are using the same IDENTITY source of (unique) key values.
Yes I realise that even with this it is possible to force a value
OVERRIDING SYSTEM VALUE and of course by changing constraints over time. I
can imagine this could make some changes expensive.

>
> The problem with a single index is that it kinda defeats the purpose
> of partitioning, i.e., "my table is large and I want to split it up".

Exactly.

But, given that people who know more than I decided to make IDENTITY work
as a "shared" property of partitions (and invested effort to do it) surely
they didn't mean for it to not be used. It can be used in circumstances
where partitioning by the identity makes sense. It can't be used (without
running afoul of this issue) in the case I described. So either I'm "doing
it wrong" - suggestions welcome, or the "work around" I described of
putting the primary key on each partition explicitly is "the right way to
do it". I assume you are implicitly saying - yes it is?

> Operations such as DETACH PARTITION would have to become more than
> just a metadata operation when you consider having to trawl through
> the index and remove all records belonging to a single partition.

Yes. If there was a single index created, this is one way it would be bad.
Which is what the existing "pk or unique column must be in the partitioning
key" rule is about. It allows relaxing the check and hence the index to per
partition as we know because we are partitioning by that column, that we
only need to check that one partition (index).

My question was whether it would be in any way practical to make the
"specification" that there is a unique or pk constraint on a column of a
partitioned table be treated as an instruction to "trust" the uniqueness
guarantee that is implicit in a IDENTITY column that is using a shared
SEQUENCE (as of the change discussed here) across partitions - so you
don't know which partition a given sequence value will be in (if any) but
you do know it won't be in multiple without, actually, using ANY indexes.
Hence you don't "need" the completely impractical "shared across
partitions" index to enforce uniqueness (well - enforce until someone does
something creative/silly using OVERRIDING SYSTEM VALUE). One could with yet
more special casing make any such OVERRIDING SYSTEM VALUE usage enforce the
constraint - but at some considerable expense - this is where you would
have to check every partition (index) to confirm uniqueness.

I guess if you take my argument far enough you arrive at a position of
never creating an implicit index on any column specified as "id int
generated always as identity unique not null" or similar .. but incurring
the cost of the scan to prove unique in the rare case above - regardless of
partitioning. That wasn't the proposal. Partitioning implies some sort of
tradeoff - this specific one seemed perhaps common enough to support.

>
> It may be possible to still have it work by doing a speculative record
> in the index for the target table then go and check all of the other
> indexes before marking the speculative entry as valid.

It is always valid - except in the OVERRIDING SYSTEM VALUE case - no?

> I think it
> would be very tricky to make it work well, however. Imagine how
> expensive ATTACH PARTITION would be!

True.

> There are probably other race
> conditions I've not thought about too. Likely, we'd get more
> complaints about this being a terrible feature than we do due to the
> fact that it's unsupported.

I'm not sure it is a "terrible feature" in the exact case of unique
IDENTITY columns (which was the only case I expected it to be reasonable
for).
I'm also far from an expert.

>
> David

On Mon, 19 Feb 2024 at 14:23, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 19 Feb 2024 at 16:32, Darryl Green <darryl(dot)green(at)gmail(dot)com> wrote:
> > 2) It would be nice to be able to specify the id as pk on the table
being partitioned (as it was in the non-partitioned definition of the
table) once to document and enforce that the partitions simply inherit the
id pk. This would seem only to need the "partition by" validation to allow
a column not mentioned in partition by clause to be defined as pk or unique
if and only if the pk/unique column is an identity column. Not a big deal
but is this practical/valid?
>
> Unfortunately, it's not as easy as you think. It's not a matter of
> dropping the check that requires all PRIMARY KEY columns are present
> in the PARTITION BY clause. For this to work a *single* index (i.e.
> non-partitioned index) would have to index all partitions. Otherwise,
> how would Postgres ensure that the value being inserted doesn't exist
> in some other partition?
>
> The problem with a single index is that it kinda defeats the purpose
> of partitioning, i.e., "my table is large and I want to split it up".
> Operations such as DETACH PARTITION would have to become more than
> just a metadata operation when you consider having to trawl through
> the index and remove all records belonging to a single partition.
>
> It may be possible to still have it work by doing a speculative record
> in the index for the target table then go and check all of the other
> indexes before marking the speculative entry as valid. I think it
> would be very tricky to make it work well, however. Imagine how
> expensive ATTACH PARTITION would be! There are probably other race
> conditions I've not thought about too. Likely, we'd get more
> complaints about this being a terrible feature than we do due to the
> fact that it's unsupported.
>
> David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Darryl Green 2024-02-19 09:13:13 Re: Partitioning, Identity and Uniqueness (given pg 16 changes)
Previous Message Ketan Popat 2024-02-19 08:08:10 Re: Partitioning, Identity and Uniqueness (given pg 16 changes)