From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Date: | 2017-12-05 22:01:01 |
Message-ID: | 2a76784c-06cb-46ec-35be-9895d144b839@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/05/2017 10:23 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>>> When SH_INSERT tries to insert that final extra value, insertdist
>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>> double the size (at least until my computer gives up, somewhere around
>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB
>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>> for that. What is the theory behind this parameter?
>>
>> You beat me to it --- after looking at simplehash.h I'd guessed that
>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>> an infinite loop, but I'd not gotten to determining which one yet.
>>
>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither
>> of them are obviously loop-proof.
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>> hashint8 | count
>> -------------+-------
>> 441526644 | 2337
>> -1117776826 | 1221
>> -1202007016 | 935
>> -2068831050 | 620
>> 1156644653 | 538
>> 553783815 | 510
>> 259780770 | 444
>> 371047036 | 394
>> 915722575 | 359
>> ... etc etc ...
>>
>> It's evidently more complicated than just that the code fails with
>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>> by a lot. There needs to be a safety valve that prevents letting
>> the hash fill factor approach zero, which is what's happening in
>> this test case.
>
> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>
Unique values or unique *hash* values?
Can you share the data, so that whoever fixes the bug can verify it also
fixes your example?
> It kinda looks like the problematic collisions arise from masking the
> computed hash values; e.g.:
>
> SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
> {
> return hash & tb->sizemask;
> }
>
That's pretty standard way to do modulo (when computing index of the
hash bucket), and certainly not the root cause.
> (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)
>
Based on the initial discussion, the problem here is twofold.
Firstly, the code assumes that if it gets certain number of bucket
collisions (essentially, the initial bucket and certain number of
neighboring buckets already being full), making the table larger is
guaranteed to reduce the number of collisions.
Which is obviously untrue for duplicate hash values, but it may also
happen for distinct hash values that form a chain (think a sequence of
hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
This causes the insane growth to the largest possible hash table.
Secondly, there's a bug that for the largest hash table we set
sizemask=0, which means we never ever get bucket index other than 0.
This causes the infinite loop.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Todd A. Cook | 2017-12-05 22:33:25 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Previous Message | Tom Lane | 2017-12-05 21:30:16 | Re: BUG #14948: cost overflow |