Re: Queryplan within FTS/GIN index -search.

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <jesper(at)krogh(dot)cc>,<pgsql-performance(at)postgresql(dot)org>, <oleg(at)sai(dot)msu(dot)su>, <teodor(at)sigaev(dot)ru>
Subject: Re: Queryplan within FTS/GIN index -search.
Date: 2009-11-03 21:50:58
Message-ID: 4AF05163020000250002C212@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> But the particular example shown here doesn't make a very good case
>> for that, because it's hard to tell how much of a penalty would be
>> taken in more realistic examples.
>
> Fair enough. We're in the early stages of moving to tsearch2 and I
> haven't run across this yet in practice. If I do, I'll follow up.

We have a staging database which allowed some limited testing quickly.
While it's real production data, we haven't been gathering this type
of data long, so it's got relatively few rows; therefore, it wasn't
feasible to try any tests which would be disk-bound, so I primed the
cache for all of these, and they are all totally served from cache.
For various reasons which I'll omit unless asked, we do our text
searches through functions which take a "selection string", turn it
into a tsquery with a little extra massaging on our part, run the
query with a minimum ranking to return, and return a set of records
ordered by the ranking in descending sequence.

Under these conditions there is a slight performance gain in adding an
additional test which matches 1356 out of 1691 rows. Not surprisingly
for a fully cached query set, timings were very consistent from run to
run. While undoubtedly a little unusual in approach, this is
production software run against real-world data. I confirmed that it
is using the GIN index on the tsvector for these runs.

By the way, the tsearch2 features have been received very well so
far. One of the first reactions from most users is surprise at how
fast it is. :-) Anyway, our production results don't confirm the
issue shown with the artificial test data.


scca=> select count(*) from "DocThumbnail" where "text" is not null;
count
-------
1691
(1 row)

Time: 0.619 ms

scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem', 0.1)) x;
count
-------
41
(1 row)

Time: 19.394 ms

scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem attorney', 0.1)) x;
count
-------
4
(1 row)

Time: 16.434 ms

scca=> select count(*) from (select
"DocThumbnail_text_rank"('attorney', 0.1)) x;
count
-------
1356
(1 row)

Time: 415.056 ms

scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem party', 0.1)) x;
count
-------
2
(1 row)

Time: 16.290 ms

scca=> select count(*) from (select "DocThumbnail_text_rank"('party',
0.1)) x;
count
-------
935
(1 row)

Time: 386.941 ms

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2009-11-03 23:43:16 Re: Optimizer + bind variables
Previous Message Jeremy Harris 2009-11-03 20:42:56 Re: Free memory usage Sol10, 8.2.9