From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: gincostestimate |
Date: | 2010-07-25 21:42:57 |
Message-ID: | 4C4CAFE1.5090208@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/07/10 14:33, Teodor Sigaev wrote:
> Patch implements much more accuracy estimation of cost for GIN index
> scan than generic cost estimation function.
Hi,
I'm reviewing this patch, and to begin with it I tried to reproduce the
problem that originally came up on -performance in
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php
The links from that mail are now dead, so I set up my own test environment:
* one table testfts(id serial, body text, body_fts tsvector)
* 50000 rows, each with 1000 random words taken from
/usr/share/dict/british-english-insane (the wbritish-insane Debian
package) separated by a single space
* each row also had the word "commonterm" at the end, 80% had
commonterm80, 60% had commonterm60 etc (using the same methodology as
Jesper, that commonterm60 can appear only if commonterm80 is in the row)
* a GIN index on the tsvectors
I was able to reproduce his issue, that is: select id from ftstest where
body_fts @@ to_tsquery('commonterm80'); was choosing a sequential scan,
which was resulting in much longer execution than the bitmap index plan
that I got after disabling seqscans.
I then applied the patch, recompiled PG and tried again... and nothing
changed. I first tried running ANALYSE and then dropping and recreating
the GIN index, but the planner still chooses the seq scan.
Full explains below (the NOTICE is a debugging aid from the patch, which
I temporarily enabled to see if it's picking up the code).
I'll continue reading the code and trying to understand what it does,
but in the meantime: am I doing something wrong that I don't see the
planner switching to the bitmap index plan? I see that the difference in
costs is small, so maybe I just need to tweak the planner knobs a bit?
Is the output below expected?
Cheers,
Jan
wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE: GIN stats: nEntryPages: 49297.000000 nDataPages: 16951.000000
nPendingPages :0.000000 nEntries: 277521.000000
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on ftstest (cost=0.00..1567.00 rows=39890 width=4) (actual
time=221.893..33179.794 rows=39923 loops=1)
Filter: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 33256.661 ms
(3 rows)
wulczer=# set enable_seqscan to false;
SET
Time: 0.257 ms
wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE: GIN stats: nEntryPages: 49297.000000 nDataPages: 16951.000000
nPendingPages :0.000000 nEntries: 277521.000000
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest (cost=449.15..1864.50 rows=39890 width=4)
(actual time=107.421..181.284 rows=39923 loops=1)
Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
-> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..439.18
rows=39890 width=0) (actual time=97.057..97.057 rows=39923 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 237.218 ms
(5 rows)
Time: 237.999 ms
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-07-25 21:44:00 | Re: TwoPO: experimental join order algorithm |
Previous Message | Robert Haas | 2010-07-25 21:41:49 | Re: psql \timing output supressed in quiet mode |