bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-26 20:02:57
Message-ID: 4AE60071.5030301@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi.

I'm currently trying to figure out why the tsearch performance seems to
vary a lot between different queryplans. I have created a sample dataset
that sort of resembles the data I have to work on.

The script that builds the dataset is at:
http://krogh.cc/~jesper/build-test.pl
and http://krogh.cc/~jesper/words.txt is needed for it to run.

Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.

The dataset consists of words randomized, but .. all records contains
"commonterm", around 80% contains commonterm80 and so on..

my $rand = rand();
push @doc,"commonterm" if $commonpos == $j;
push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;

Results are run multiple times after each other so they should be
reproducible:

ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on ftstest (cost=0.00..10750.00 rows=40188 width=4) (actual
time=0.102..1792.215 rows=40082 loops=1)
Filter: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 1809.437 ms
(3 rows)

ftstest=# set enable_seqscan=off;
SET
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest (cost=115389.14..125991.96 rows=40188
width=4) (actual time=17.445..197.356 rows=40082 loops=1)
Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
-> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..115379.09
rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 204.201 ms
(5 rows)

Given that the seq-scan have to visit 50K row to create the result and
the bitmap heap scan only have to visit 40K (but search the index) we
would expect the seq-scan to be at most 25% more expensive than the
bitmap-heap scan.. e.g. less than 300ms.

Jesper
--
Jesper

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2009-10-27 04:57:05 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous Message Waldomiro 2009-10-26 19:56:12 Re: query planning different in plpgsql?