From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com> |
Subject: | Re: Hash Functions |
Date: | 2017-05-13 17:51:27 |
Message-ID: | CAMp0ubcMHH-Lwwgbooq5soU6Hmi37GxCPBB=LmoQiJyUMWO8Rw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 12, 2017 at 10:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maintaining such a property for float8 (and the types that depend on it)
> might be possible if you believe that nobody ever uses anything but IEEE
> floats, but we've never allowed that as a hard assumption before.
This is not such a big practical problem (for me at least) because
hashing of floats is of dubious value.
> Even architecture dependence isn't the whole scope of the problem.
> Consider for example dumping a LATIN1-encoded database and trying
> to reload it into a UTF8-encoded database. People will certainly
> expect that to be possible, and do you want to guarantee that the
> hash of a text value is encoding-independent?
That is a major problem. In an ideal world, we could make that work
with something like ucol_getSortKey(), but we don't require ICU, and
we can't mix getSortKey() with strxfrm(), or even strxfrm() results
from different platforms.
I don't think it's correct to hash the code points, either, because
strings may be considered equal in a locale even if the code points
aren't identical. But I don't think postgres lives up to that standard
currently.
But hash partitioning is too valuable to give up on entirely. I think
we should consider supporting a limited subset of types for now with
something not based on the hash am.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-05-13 17:57:15 | Re: Hash Functions |
Previous Message | Pavel Stehule | 2017-05-13 16:26:15 | proposal - using names as primary names of plpgsql function parameters instead $ based names |