Re: Fix to enum hashing for dump and restore

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, soumyadeep2007(at)gmail(dot)com, alexandra(dot)wanglei(at)gmail(dot)com
Subject: Re: Fix to enum hashing for dump and restore
Date: 2023-01-25 03:56:10
Message-ID: 2583859.1674618970@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew <pgsqlhackers(at)andrewrepp(dot)com> writes:
> I have discovered a bug in one usage of enums. If a table with hash
> partitions uses an enum as a partitioning key, it can no longer be
> backed up and restored correctly. This is because enums are represented
> simply as oids, and the hash function for enums hashes that oid to
> determine partition distribution. Given the way oids are assigned, any
> dump+restore of a database with such a table may fail with the error
> "ERROR: new row for relation "TABLENAME" violates partition constraint".

Ugh, that was not well thought out :-(. I suppose this isn't a problem
for pg_upgrade, which should preserve the enum value OIDs, but an
ordinary dump/restore will indeed hit this.

> I have written a patch to fix this bug (attached), by instead having the
> hashenum functions look up the enumsortorder ID of the value being
> hashed. These are deterministic across databases, and so allow for
> stable dump and restore.

Unfortunately, I'm not sure those are as deterministic as all that.
They are floats, so there's a question of roundoff error, not to
mention cross-platform variations in what a float looks like. (At the
absolute minimum, I think we'd have to change your patch to force
consistent byte ordering of the floats.) Actually though, roundoff
error wouldn't be a problem for the normal exact-integer values of
enumsortorder. Where it could come into play is with the fractional
values used after you insert a value into the existing sort order.
And then the whole idea fails, because a dump/restore won't duplicate
those fractional values.

Another problem with this approach is that we can't get from here to there
without a guaranteed dump/reload failure, since it's quite unlikely that
the partition assignment will be the same when based on enumsortorder
as it was when based on OIDs. Worse, it also breaks the pg_upgrade case.

I wonder if it'd work to make pg_dump force --load-via-partition-root
mode when a hashed partition key includes an enum.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-25 04:28:35 Re: plpython vs _POSIX_C_SOURCE
Previous Message Andrew 2023-01-25 03:30:06 Fix to enum hashing for dump and restore