Re: BUG #17746: Partitioning by hash of a text depends on icu version when text collation is not deterministic

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andrewbille(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17746: Partitioning by hash of a text depends on icu version when text collation is not deterministic
Date: 2023-01-11 17:39:06
Message-ID: 530248.1673458746@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> What bothers me is that partitioning depends on the hash that can be
> computed differently with the OS upgrade/migration.

There's basically no way to avoid such problems with a non-deterministic
collation. The hash function is required to compute the same hash for
all values that compare equal, and that set can change if the collation
does. Even if the collation hasn't changed in any user-visible way,
what we are hashing for such cases is the result of ucol_getSortKey(),
and the new collation version might well produce a different answer.

Personally, I think hash partitioning is an anti-pattern that ought
to come with bright red warning flags in the docs. If you think you
want it, you're generally wrong, for a number of reasons beyond this.

(Admittedly, range partitioning can also get broken by collation
updates, but at least that doesn't happen without user-visible
behavioral changes in the collation.)

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-01-11 23:37:25 BUG #17747: Registry entry "Base Directory" is not populated if you only install Command-line tools
Previous Message Alex Richman 2023-01-11 15:41:30 Re: Logical Replica ReorderBuffer Size Accounting Issues