From: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Date: | 2017-12-05 22:33:25 |
Message-ID: | 219b37b2-d740-d04a-203a-49ae422028f7@blackducksoftware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/05/17 17:01, Tomas Vondra wrote:
>
>
> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>> double the size (at least until my computer gives up, somewhere around
>>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB
>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>> for that. What is the theory behind this parameter?
>>>
>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>
>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither
>>> of them are obviously loop-proof.
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>> order by 2 desc;
>>> hashint8 | count
>>> -------------+-------
>>> 441526644 | 2337
>>> -1117776826 | 1221
>>> -1202007016 | 935
>>> -2068831050 | 620
>>> 1156644653 | 538
>>> 553783815 | 510
>>> 259780770 | 444
>>> 371047036 | 394
>>> 915722575 | 359
>>> ... etc etc ...
>>>
>>> It's evidently more complicated than just that the code fails with
>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>> by a lot. There needs to be a safety valve that prevents letting
>>> the hash fill factor approach zero, which is what's happening in
>>> this test case.
>>
>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>
>
> Unique values or unique *hash* values?
Unique values.
> Can you share the data, so that whoever fixes the bug can verify it also
> fixes your example?
Sure. It's attached.
-- todd
Attachment | Content-Type | Size |
---|---|---|
reproducer_2.pgsql | text/plain | 3.3 MB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dipesh Kamdar | 2017-12-05 23:13:44 | Re: BUG #14938: ALTER TABLE hang/ poor performance |
Previous Message | Tomas Vondra | 2017-12-05 22:01:01 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |