Re: WIP: store additional info in GIN index

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: store additional info in GIN index
Date: 2012-12-22 16:15:49
Message-ID: CAPpHfdt+i0rjVouRNqiGSQBBDgaYsM3UewYLmAvOU-_OfAGkfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Thu, Dec 6, 2012 at 5:44 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> Then I've run a simple benchmarking script, and the results are not as
> good as I expected, actually I'm getting much worse performance than
> with the original GIN index.
>
> The following table contains the time of loading the data (not a big
> difference), and number of queries per minute for various number of
> words in the query.
>
> The queries looks like this
>
> SELECT id FROM messages
> WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...')
>
> so it's really the simplest form of FTS query possible.
>
> without patch | with patch
> --------------------------------------------
> loading 750 sec | 770 sec
> 1 word 1500 | 1100
> 2 words 23000 | 9800
> 3 words 24000 | 9700
> 4 words 16000 | 7200
> --------------------------------------------
>
> I'm not saying this is a perfect benchmark, but the differences (of
> querying) are pretty huge. Not sure where this difference comes from,
> but it seems to be quite consistent (I usually get +-10% results, which
> is negligible considering the huge difference).
>
> Is this an expected behaviour that will be fixed by another patch?
>

Another patches which significantly accelerate index search will be
provided. This patch changes only GIN posting lists/trees storage. However,
it wasn't expected that this patch significantly changes index scan speed
in any direction.

The database contains ~680k messages from the mailing list archives,
> i.e. about 900 MB of data (in the table), and the GIN index on tsvector
> is about 900MB too. So the whole dataset nicely fits into memory (8GB
> RAM), and it seems to be completely CPU bound (no I/O activity at all).
>
> The configuration was exactly the same in both cases
>
> shared buffers = 1GB
> work mem = 64 MB
> maintenance work mem = 256 MB
>
> I can either upload the database somewhere, or provide the benchmarking
> script if needed.

Unfortunately, I can't reproduce such huge slowdown on my testcases. Could
you share both database and benchmarking script?

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-12-22 17:13:01 strange behave of fulltext query when query contains negation of prefix
Previous Message Andres Freund 2012-12-22 11:50:12 Re: foreign key locks