Re: Fix to enum hashing for dump and restore

From: Andrew <pgsqlhackers(at)andrewrepp(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, soumyadeep2007(at)gmail(dot)com, alexandra(dot)wanglei(at)gmail(dot)com
Subject: Re: Fix to enum hashing for dump and restore
Date: 2023-01-25 13:23:11
Message-ID: f80347af-12d0-40d3-b8e6-65eb79e6a5fb@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Those are excellent points.
We will investigate adjusting pg_dump behavior,
as this is primarily a dump+restore issue.

Thank you!

-Andrew J Repp (VMware)

On Tue, Jan 24, 2023, at 9:56 PM, Tom Lane wrote:
> Andrew <pgsqlhackers(at)andrewrepp(dot)com> writes:
> > I have discovered a bug in one usage of enums. If a table with hash
> > partitions uses an enum as a partitioning key, it can no longer be
> > backed up and restored correctly. This is because enums are represented
> > simply as oids, and the hash function for enums hashes that oid to
> > determine partition distribution. Given the way oids are assigned, any
> > dump+restore of a database with such a table may fail with the error
> > "ERROR: new row for relation "TABLENAME" violates partition constraint".
>
> Ugh, that was not well thought out :-(. I suppose this isn't a problem
> for pg_upgrade, which should preserve the enum value OIDs, but an
> ordinary dump/restore will indeed hit this.
>
> > I have written a patch to fix this bug (attached), by instead having the
> > hashenum functions look up the enumsortorder ID of the value being
> > hashed. These are deterministic across databases, and so allow for
> > stable dump and restore.
>
> Unfortunately, I'm not sure those are as deterministic as all that.
> They are floats, so there's a question of roundoff error, not to
> mention cross-platform variations in what a float looks like. (At the
> absolute minimum, I think we'd have to change your patch to force
> consistent byte ordering of the floats.) Actually though, roundoff
> error wouldn't be a problem for the normal exact-integer values of
> enumsortorder. Where it could come into play is with the fractional
> values used after you insert a value into the existing sort order.
> And then the whole idea fails, because a dump/restore won't duplicate
> those fractional values.
>
> Another problem with this approach is that we can't get from here to there
> without a guaranteed dump/reload failure, since it's quite unlikely that
> the partition assignment will be the same when based on enumsortorder
> as it was when based on OIDs. Worse, it also breaks the pg_upgrade case.
>
> I wonder if it'd work to make pg_dump force --load-via-partition-root
> mode when a hashed partition key includes an enum.
>
> regards, tom lane
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-01-25 13:29:32 CREATE ROLE bug?
Previous Message shveta malik 2023-01-25 13:02:11 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication