Re: tsearch2, large data and indexes

From: Ivan Voras <ivoras(at)freebsd(dot)org>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: tsearch2, large data and indexes
Date: 2014-04-23 11:08:25
Message-ID: CAF-QHFVYw5-2MwAVqCpZp8spyEfVP_k918naG=XV0ypTe=gOvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 22 April 2014 17:58, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Apr 22, 2014 at 12:57 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
>>
>> On 22 April 2014 08:40, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
>> wrote:
>> > On 04/20/2014 02:15 AM, Ivan Voras wrote:
>> >> More details: after thinking about it some more, it might have
>> >> something to do with tsearch2 and indexes: the large data in this case
>> >> is a tsvector, indexed with GIN, and the query plan involves a
>> >> re-check condition.
>
>
> I think bitmap scans always insert a recheck, do to the possibility of
> bitmap overflow.
>
> But that doesn't mean that it ever got triggered. In 9.4., explain
> (analyze) will report on the overflows.

Ok, I found out what is happening, quoting from the documentation:

"GIN indexes are not lossy for standard queries, but their performance
depends logarithmically on the number of unique words. (However, GIN
indexes store only the words (lexemes) oftsvector values, and not
their weight labels. Thus a table row recheck is needed when using a
query that involves weights.)"

My query doesn't have weights but the tsvector in the table has them -
I take it this is what is meant by "involves weights."

So... there's really no way for tsearch2 to produce results based on
the index alone, without recheck? This is... limiting.

>> Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
>> recheck condition - but there is.
>> This is the query:
>>
>> SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
>> rank, html_filename
>> FROM documents, to_tsquery('document') AS q
>> WHERE fts_data @@ q
>> ORDER BY rank DESC LIMIT 25;
>>
>> And here is the explain analyze: http://explain.depesz.com/s/4xm
>> It clearly shows a bitmap index scan operation is immediately followed
>> by a recheck operation AND that the recheck operation actually does
>> something, because it reduces the number of records from 61 to 58
>> (!!!).
>
>
> That could be ordinary visibility checking, not qual rechecking.

Visibility as in transaction-wise? It's not, this was the only client
connected to the dev server, and the only transaction(s) happening.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matheus de Oliveira 2014-04-23 12:26:30 Re: tsearch2, large data and indexes
Previous Message hubert depesz lubaczewski 2014-04-23 10:28:59 Re: Best practices for update timestamp with/without triggers