From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Cc: | <oleg(at)sai(dot)msu(dot)su> |
Subject: | text search: tablescan cost for a tsvector |
Date: | 2012-02-06 11:05:28 |
Message-ID: | C4DAC901169B624F933534A26ED7DF310861B363@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
I have quite systematically better performance with the text search when
I disable the statistics collection for the tsvector column.
So I wonder if such statistics ever make sense.
Here a testcase:
The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla'
exists in all tsvector:
Without statistics, the planner decide as expected for the gin index.
After analyze, it switch to a table scan which is also expected, but the
query is 3 times slower.
My first thought was that the Bitmap Heap Scan was really fast as the
searched term is always at the first position.
So I repeated the test with an additional search term at the last
position, but without significant change:
(result from the 6. test below)
without analyze: http://explain.depesz.com/s/6At
with analyze: http://explain.depesz.com/s/r3B
best regards,
Marc Mamin
Here all my results, always one of the fastest from a few runs.
CREATE TABLE tsv_test
(
id bigserial NOT NULL,
v tsvector
);
<The code to fill the table with test data can be found below>
The test query:
explain analyze
select id from tsv_test where v @@ 'lexeme3179'::tsquery
UNION ALL
select id from tsv_test where v @@ 'lexeme5'::tsquery
UNION ALL
select id from tsv_test where v @@ 'fooblablabla'::tsquery
The results
A) on first lexeme
1) without indexes without analyze:
http://explain.depesz.com/s/bOv
2) alter table tsv_test add constraint tsv_test_pk primary key(id);
http://explain.depesz.com/s/9QQ (same as previous);
3) create index tsv_gin on tsv_test using gin(v);
http://explain.depesz.com/s/r4M <= fastest
4) ANALYZE tsv_test (id);
http://explain.depesz.com/s/MyC (same as previous);
5) ANALYZE tsv_test;
http://explain.depesz.com/s/qu3S
B) on lastlexeme
6) create table tsv_test2 as select id,
v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector
from tsv_test;
explain analyze
select id from tsv_test2 where v @@
'zzthisisalongerlexemethisisalongerlexeme'::tsquery
http://explain.depesz.com/s/6At
ANALYZE tsv_test2;
http://explain.depesz.com/s/r3B
test data:
insert into tsv_test (v)
select
cast('fooblablabla' ||
' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4||
' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6||
' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9||
' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 ||
' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 ||
' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 ||
' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 ||
' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 ||
' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 ||
' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 ||
' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 ||
' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 ||
' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 ||
' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 ||
' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 ||
' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 ||
' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 ||
' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 ||
' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 ||
' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 ||
' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 ||
' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 ||
' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 ||
' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 ||
' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 ||
' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 ||
' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 ||
' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 ||
' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s
UNION ALL
select
cast('fooblablabla' ||
' thisisalongerlexemethisisalongerlexeme'||s%2|| '
thisisalongerlexemethisisalongerlexeme'||s%3|| '
thisisalongerlexemethisisalongerlexeme'||s%4||
' thisisalongerlexemethisisalongerlexeme'||s%4|| '
thisisalongerlexemethisisalongerlexeme'||s%5|| '
thisisalongerlexemethisisalongerlexeme'||s%6||
' thisisalongerlexemethisisalongerlexeme'||s%7|| '
thisisalongerlexemethisisalongerlexeme'||s%8|| '
thisisalongerlexemethisisalongerlexeme'||s%9||
' thisisalongerlexemethisisalongerlexeme'||s%10 || '
thisisalongerlexemethisisalongerlexeme2'||s%11 || '
thisisalongerlexemethisisalongerlexeme3'||s%12 ||
' thisisalongerlexemethisisalongerlexeme'||s%11 || '
thisisalongerlexemethisisalongerlexeme2'||s%12 || '
thisisalongerlexemethisisalongerlexeme3'||s%22 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%13 || '
thisisalongerlexemethisisalongerlexeme3'||s%32 ||
' thisisalongerlexemethisisalongerlexeme'||s%13 || '
thisisalongerlexemethisisalongerlexeme2'||s%14 || '
thisisalongerlexemethisisalongerlexeme3'||s%42 ||
' thisisalongerlexemethisisalongerlexeme'||s%14 || '
thisisalongerlexemethisisalongerlexeme2'||s%15 || '
thisisalongerlexemethisisalongerlexeme3'||s%52 ||
' thisisalongerlexemethisisalongerlexeme'||s%15 || '
thisisalongerlexemethisisalongerlexeme2'||s%16 || '
thisisalongerlexemethisisalongerlexeme3'||s%62 ||
' thisisalongerlexemethisisalongerlexeme'||s%16 || '
thisisalongerlexemethisisalongerlexeme2'||s%17 || '
thisisalongerlexemethisisalongerlexeme3'||s%72 ||
' thisisalongerlexemethisisalongerlexeme'||s%17 || '
thisisalongerlexemethisisalongerlexeme2'||s%18 || '
thisisalongerlexemethisisalongerlexeme3'||s%82 ||
' thisisalongerlexemethisisalongerlexeme'||s%18 || '
thisisalongerlexemethisisalongerlexeme2'||s%19 || '
thisisalongerlexemethisisalongerlexeme3'||s%92 ||
' thisisalongerlexemethisisalongerlexeme'||s%19 || '
thisisalongerlexemethisisalongerlexeme2'||s%10 || '
thisisalongerlexemethisisalongerlexeme3'||s%15 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%71 || '
thisisalongerlexemethisisalongerlexeme3'||s%16 ||
' thisisalongerlexemethisisalongerlexeme'||s%20 || '
thisisalongerlexemethisisalongerlexeme2'||s%81 || '
thisisalongerlexemethisisalongerlexeme3'||s%17 ||
' thisisalongerlexemethisisalongerlexeme'||s%35 || '
thisisalongerlexemethisisalongerlexeme2'||s%91 || '
thisisalongerlexemethisisalongerlexeme3'||s%18 ||
' thisisalongerlexemethisisalongerlexeme'||s%100 || '
thisisalongerlexemethisisalongerlexeme2'||s%110 || '
thisisalongerlexemethisisalongerlexeme3'||s%120 ||
' thisisalongerlexemethisisalongerlexeme'||s%110 || '
thisisalongerlexemethisisalongerlexeme2'||s%120 || '
thisisalongerlexemethisisalongerlexeme3'||s%220 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%130 || '
thisisalongerlexemethisisalongerlexeme3'||s%320 ||
' thisisalongerlexemethisisalongerlexeme'||s%130 || '
thisisalongerlexemethisisalongerlexeme2'||s%140 || '
thisisalongerlexemethisisalongerlexeme3'||s%420 ||
' thisisalongerlexemethisisalongerlexeme'||s%140 || '
thisisalongerlexemethisisalongerlexeme2'||s%150 || '
thisisalongerlexemethisisalongerlexeme3'||s%520 ||
' thisisalongerlexemethisisalongerlexeme'||s%150 || '
thisisalongerlexemethisisalongerlexeme2'||s%160 || '
thisisalongerlexemethisisalongerlexeme3'||s%620 ||
' thisisalongerlexemethisisalongerlexeme'||s%160 || '
thisisalongerlexemethisisalongerlexeme2'||s%170 || '
thisisalongerlexemethisisalongerlexeme3'||s%720 ||
' thisisalongerlexemethisisalongerlexeme'||s%170 || '
thisisalongerlexemethisisalongerlexeme2'||s%180 || '
thisisalongerlexemethisisalongerlexeme3'||s%820 ||
' thisisalongerlexemethisisalongerlexeme'||s%180 || '
thisisalongerlexemethisisalongerlexeme2'||s%190 || '
thisisalongerlexemethisisalongerlexeme3'||s%920 ||
' thisisalongerlexemethisisalongerlexeme'||s%190 || '
thisisalongerlexemethisisalongerlexeme2'||s%100 || '
thisisalongerlexemethisisalongerlexeme3'||s%150 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%710 || '
thisisalongerlexemethisisalongerlexeme3'||s%160 ||
' thisisalongerlexemethisisalongerlexeme'||s%200 || '
thisisalongerlexemethisisalongerlexeme2'||s%810 || '
thisisalongerlexemethisisalongerlexeme3'||s%170 ||
' thisisalongerlexemethisisalongerlexeme'||s%350 || '
thisisalongerlexemethisisalongerlexeme2'||s%910 || '
thisisalongerlexemethisisalongerlexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s
From | Date | Subject | |
---|---|---|---|
Next Message | Ofer Israeli | 2012-02-07 10:18:35 | Inserts or Updates |
Previous Message | Saurabh | 2012-02-05 17:29:22 | Re: How to improve insert speed with index on text column |