Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, pgsql-bugs(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date: 2018-01-26 23:22:26
Message-ID: 12511.1517008946@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> I suspect you're right the hash is biased to lohalf bits, as you wrote
> in the 19/12 message.

I don't see any bias in what it's doing, which is basically xoring the
two halves and hashing the result. It's possible though that Todd's
data set contains values in which corresponding bits of the high and
low halves are correlated somehow, in which case the xor would produce
a lot of cancellation and a relatively small number of distinct outputs.

If we weren't bound by backwards compatibility, we could consider changing
to logic more like "if the value is within the int4 range, apply int4hash,
otherwise hash all 8 bytes normally". But I don't see how we can change
that now that hash indexes are first-class citizens.

In any case, we still need a fix for the behavior that the hash table size
is blown out by lots of collisions, because that can happen no matter what
the hash function is. Andres seems to have dropped the ball on doing
something about that.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-01-26 23:44:28 Re: pg_hba_file_rules: "scram-sha256" instead of "scram-sha-256"
Previous Message Tomas Vondra 2018-01-26 23:11:40 Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop