Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date: 2017-12-06 19:57:44
Message-ID: ebe8b411-006e-7863-3c60-67ddafe1c636@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 12/06/2017 08:35 PM, Andres Freund wrote:
> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>
>>> Based on the initial discussion, the problem here is twofold.
>>>
>>> Firstly, the code assumes that if it gets certain number of bucket
>>> collisions (essentially, the initial bucket and certain number of
>>> neighboring buckets already being full), making the table larger is
>>> guaranteed to reduce the number of collisions.
>>>
>>> Which is obviously untrue for duplicate hash values, but it may also
>>> happen for distinct hash values that form a chain (think a sequence of
>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>
>> I've managed to construct (by sheer brute force) an example that breaks
>> simplehash in this way. It triggers the growth by having a sequence of
>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
>> then trying to insert a value with duplicate hash (say, 4).
>
> I fail to be excited by this. That's possible for any sort of hashtable
> in some way. You might hit issues due to resizing or due to lookup
> performance, but you'll hit problem. That's a fairly fundamental issue
> of pure hashtables.
>

Eh? I think you misunderstood the issue this triggers. I'm perfectly
fine with poor lookup performance - that's expected and reasonable. But
it actually triggers the same behavior as the already presented example,
i.e. the hash table grows indefinitely (so eventually gets OOM).

I really doubt allocating 3GB of memory (which is when my laptop gives
up and kills the query) to do distinct on 50MB table is a sane behavior.
Even if the table is constructed in so intentionally evil way.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-12-06 20:00:05 Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Previous Message Andres Freund 2017-12-06 19:35:45 Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop