From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Date: | 2017-12-06 19:57:44 |
Message-ID: | ebe8b411-006e-7863-3c60-67ddafe1c636@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/06/2017 08:35 PM, Andres Freund wrote:
> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>
>>> Based on the initial discussion, the problem here is twofold.
>>>
>>> Firstly, the code assumes that if it gets certain number of bucket
>>> collisions (essentially, the initial bucket and certain number of
>>> neighboring buckets already being full), making the table larger is
>>> guaranteed to reduce the number of collisions.
>>>
>>> Which is obviously untrue for duplicate hash values, but it may also
>>> happen for distinct hash values that form a chain (think a sequence of
>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>
>> I've managed to construct (by sheer brute force) an example that breaks
>> simplehash in this way. It triggers the growth by having a sequence of
>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
>> then trying to insert a value with duplicate hash (say, 4).
>
> I fail to be excited by this. That's possible for any sort of hashtable
> in some way. You might hit issues due to resizing or due to lookup
> performance, but you'll hit problem. That's a fairly fundamental issue
> of pure hashtables.
>
Eh? I think you misunderstood the issue this triggers. I'm perfectly
fine with poor lookup performance - that's expected and reasonable. But
it actually triggers the same behavior as the already presented example,
i.e. the hash table grows indefinitely (so eventually gets OOM).
I really doubt allocating 3GB of memory (which is when my laptop gives
up and kills the query) to do distinct on 50MB table is a sane behavior.
Even if the table is constructed in so intentionally evil way.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-12-06 20:00:05 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Previous Message | Andres Freund | 2017-12-06 19:35:45 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |