Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, <olaf(dot)gw(at)googlemail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit
Date: 2015-01-29 14:07:49
Message-ID: 54CA3EB5.3030804@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 01/29/2015 03:09 PM, Michael Paquier wrote:
> On Thu, Jan 29, 2015 at 3:12 AM, <olaf(dot)gw(at)googlemail(dot)com> wrote:
>> Bug reference: 12694
>> Logged by: Olaf Gawenda
>> Email address: olaf(dot)gw(at)googlemail(dot)com
>> PostgreSQL version: 9.4.0
>> Operating system: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7
>> Description:
>>
>> the following sequence of commands get a crash if the numer of result rows
>> is lower than gin_fuzzy_search_limit:
>>
>> create table test (t text, ts_vec tsvector);
>>
>> insert into test (t) values (),(),(), ...; -- test data not posted
>>
>> update test set ts_vec = to_tsvector('english', t);
>>
>> create index on test using gin(ts_vec);
>> analyze test;
>> set enable_seqscan = off;
>> set gin_fuzzy_search_limit = 1000;
>>
>> select t from test where ts_vec @@ to_tsquery('english', '...');
>
> This can be reproduced easily with a test case like that:
> create table aa as
> select array[(random() * 1000000)::int,
> (random() * 1000000)::int,
> (random() * 1000000)::int] as a
> from generate_series(1,10);
> create index aai on aa using gin(a);
> set gin_fuzzy_search_limit = 1;
> set enable_seqscan = off;
> select * from aa where a <@ array[1,2];

The problem is in startScan() function:

> if (GinFuzzySearchLimit > 0)
> {
> /*
> * If all of keys more than threshold we will try to reduce result, we
> * hope (and only hope, for intersection operation of array our
> * supposition isn't true), that total result will not more than
> * minimal predictNumberResult.
> */
>
> for (i = 0; i < so->totalentries; i++)
> if (so->entries[i]->predictNumberResult <= so->totalentries * GinFuzzySearchLimit)
> return;
>
> for (i = 0; i < so->totalentries; i++)
> if (so->entries[i]->predictNumberResult > so->totalentries * GinFuzzySearchLimit)
> {
> so->entries[i]->predictNumberResult /= so->totalentries;
> so->entries[i]->reduceResult = TRUE;
> }
> }
>
> for (i = 0; i < so->nkeys; i++)
> startScanKey(ginstate, so, so->keys + i);
> }

If the early return is taken, startScanKey() is not called, and many
fields in the GinScanKey struct are left uninitialized. That causes the
segfault later.

This was not as big a problem before 9.4, because startScanKey() didn't
do very much. It just reset a few fields, which in a new scan were reset
already by ginNewScanKey(). But it is in fact possible to get an
assertion failure on 9.3 too, if the plan contains a re-scan of GIN
scan, and gin_fuzzy_search_limit is set. Attached is a script that does
it. Not sure why, but I'm not seeing a segfault or assert failure on
earlier branches. The plan of the segfaulting query looks identical
between 9.2 and 9.3, so perhaps there have been some changes to the
executor on how and when it calls rescan. Nevertheless, the code looks
just as wrong on earlier branches, so I think it should be fixed all the
way to 9.1 where that early return in startScan() was introduced.

The fix is simple: make sure that startScanKey() is always called, by
getting rid of the early return above. Attached. I'll apply this later
today or tomorrow unless someone sees a problem with this.

- Heikki

Attachment Content-Type Size
gin-rescan-assert-9.3.txt text/plain 618 bytes
0001-Fix-bug-where-GIN-scan-keys-were-not-initialized-wit.patch text/x-diff 2.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2015-01-29 14:44:50 Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit
Previous Message Michael Paquier 2015-01-29 13:09:48 Re: BUG #12694: crash if the number of result rows is lower than gin_fuzzy_search_limit