From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Date: | 2017-12-06 13:33:36 |
Message-ID: | c15378b3-d4aa-a5c0-5538-dd52904dcbc6@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/05/2017 11:33 PM, Todd A. Cook wrote:
> On 12/05/17 17:01, Tomas Vondra wrote:
>>
>>
>> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>>> On 11/27/17 23:03, Tom Lane wrote:
>>>> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>>> double the size (at least until my computer gives up, somewhere around
>>>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB
>>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>>> for that. What is the theory behind this parameter?
>>>>
>>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>>
>>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither
>>>> of them are obviously loop-proof.
>>>>
>>>> Note that the sample data has a lot of collisions:
>>>>
>>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>>> order by 2 desc;
>>>> hashint8 | count
>>>> -------------+-------
>>>> 441526644 | 2337
>>>> -1117776826 | 1221
>>>> -1202007016 | 935
>>>> -2068831050 | 620
>>>> 1156644653 | 538
>>>> 553783815 | 510
>>>> 259780770 | 444
>>>> 371047036 | 394
>>>> 915722575 | 359
>>>> ... etc etc ...
>>>>
>>>> It's evidently more complicated than just that the code fails with
>>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>>> by a lot. There needs to be a safety valve that prevents letting
>>>> the hash fill factor approach zero, which is what's happening in
>>>> this test case.
>>>
>>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>>
>>
>> Unique values or unique *hash* values?
>
> Unique values.
>
>
>> Can you share the data, so that whoever fixes the bug can verify it also
>> fixes your example?
>
> Sure. It's attached.
>
Seems the dataset has pretty much the same issue as the one reported
before, that is
select hashint8(val), count(distinct val), count(*) from temp_f_03 group
by 1 order by 2 desc;
hashint8 | count | count
-------------+-------+-------
-1971396144 | 45 | 45
2035896403 | 42 | 42
-1633843397 | 30 | 30
1425704662 | 29 | 29
-455482779 | 22 | 22
-300163565 | 17 | 17
-1803963420 | 17 | 17
-537082846 | 14 | 14
603707034 | 13 | 13
-176369887 | 12 | 12
1274957136 | 11 | 11
1465522632 | 11 | 11
-1589862230 | 10 | 10
-1145403239 | 10 | 10
i.e. there are many hash collisions (more than in the other data set).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | kop | 2017-12-06 16:18:39 | BUG #14950: RPM EPEL PostGIS24_10 install breaks "yum update" |
Previous Message | jarda.urik | 2017-12-06 10:50:00 | BUG #14949: array_append() - performance issues (in update) |