Quick Links

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date:	2017-12-06 21:21:17
Message-ID:	0c3d7ca9-1619-8b78-0a33-277a08db335a@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 12/06/2017 10:12 PM, Andres Freund wrote:
> On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
>> On 12/06/2017 09:46 PM, Andres Freund wrote:
>>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>>>> It's one thing when the hash table takes longer to lookup something or
>>>
>>> longer aka "forever".
>>>
>>
>> Not necessarily.
>
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
>
> Meh, we're talking about adversarial attacks here.
>
>
>>>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>>>> take. I'm perfectly fine with that, particularly when it's a worst-case
>>>> evil data set like this one.
>>>
>>> I think the way to prevent that kind of attack is to add randomization.
>>>
>>
>> By randomization you mean universal hashing [1], or something else?
>
> No, adding in a random seed to the hash. It'd be a lot better if we
> had a way to provide internal state to the hashfunction, but even
> just using hash_combine()ing a random number into a hash would be a
> lot better than what we're doing now.
>

The universal hashing (of "our" hash value) would have the benefit that
we could change it when growing the hash table. I.e. do this

h(x) = ((A * hash_any(x) + B) mod P)) mod M

and generate new A, B, P whenever we grow the hashtable. That would
further reduce the probability of either of the failures described in
this thread (I mean, hitting it again after growing the table).

I don't know if it's worth the additional CPU cycles, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop at 2017-12-06 21:12:25 from Andres Freund

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Todd A. Cook	2017-12-06 21:38:18	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Previous Message	Andres Freund	2017-12-06 21:12:25	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop