From: | "Dmitry Koterov" <dmitry(at)koterov(dot)ru> |
---|---|
To: | "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Cc: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: TSearch2: find a QUERY that does match a single document |
Date: | 2008-09-13 19:12:37 |
Message-ID: | d7df81620809131212x1b79f1b0n81b98e0e2b9f7ad4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>
> explain analyze
>> select * from test.test_tsq
>> where to_tsvector('400000x400000') @@ q
>>
>
> why do you need tsvector @@ q ? Much better to use tsquery = tsquery
>
> test=# explain analyze select * from test_tsq where q =
> '400000x400000'::tsque>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------
> Seq Scan on test_tsq (cost=0.00..16667.01 rows=1 width=38) (actual
> time=129.208..341.111 rows=1 loops=1)
> Filter: (q = '''400000x400000'''::tsquery)
> Total runtime: 341.134 ms
> (3 rows)
>
M-mmm... Seems your understood me incorrectly.
I have to find NOT queries which are exactly equal to another query, BUT
queries which MATCH the GIVEN document. '400000x400000' was a sample only,
in real cases it will be 1-2K document.
Here is a more realistic sample:
explain analyze
select * from test.test_tsq
where to_tsvector('
Here is a real document text. It may be long, 1-2K.
In this sample it contains a lexem "400000x400000", so there is a tsquery
in test_tsq.q which matches this document. I need to find all such queries
fast.
Of course, in real cases the document text is unpredictable.
') @@ q
> QUERY PLAN
>> Seq Scan on test_tsq (cost=0.00..17477.01 rows=800 width=36) (actual
>> time=68.698..181.458 rows=1 loops=1)
>> Filter: ('''400000x400000'':1'::tsvector @@ q)
>> Total runtime: 181.484 ms
>>
>
> '800' is the number of estimated rows, which is not good, since you got
> only 1 row.
>
Why 800? The table contains 800000 rows, and seqscan is used. Does it scan
the whole table or not? If yes, possibly there is a bug in explain output?
(No mater if I create GIST index on test_tsq.q or not, the number of rows is
still 800, so it seems to me that GIST index is not used at all.)
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Koterov | 2008-09-13 19:27:10 | Is there bigintarray? |
Previous Message | Gregory Stark | 2008-09-13 16:50:52 | Re: "Healing" a table after massive updates |