From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Date: | 2017-12-06 21:21:17 |
Message-ID: | 0c3d7ca9-1619-8b78-0a33-277a08db335a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/06/2017 10:12 PM, Andres Freund wrote:
> On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
>> On 12/06/2017 09:46 PM, Andres Freund wrote:
>>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>>>> It's one thing when the hash table takes longer to lookup something or
>>>
>>> longer aka "forever".
>>>
>>
>> Not necessarily.
>
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
>
> Meh, we're talking about adversarial attacks here.
>
>
>>>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>>>> take. I'm perfectly fine with that, particularly when it's a worst-case
>>>> evil data set like this one.
>>>
>>> I think the way to prevent that kind of attack is to add randomization.
>>>
>>
>> By randomization you mean universal hashing [1], or something else?
>
> No, adding in a random seed to the hash. It'd be a lot better if we
> had a way to provide internal state to the hashfunction, but even
> just using hash_combine()ing a random number into a hash would be a
> lot better than what we're doing now.
>
The universal hashing (of "our" hash value) would have the benefit that
we could change it when growing the hash table. I.e. do this
h(x) = ((A * hash_any(x) + B) mod P)) mod M
and generate new A, B, P whenever we grow the hashtable. That would
further reduce the probability of either of the failures described in
this thread (I mean, hitting it again after growing the table).
I don't know if it's worth the additional CPU cycles, of course.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Todd A. Cook | 2017-12-06 21:38:18 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |
Previous Message | Andres Freund | 2017-12-06 21:12:25 | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop |