From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [GENERAL] Incorrect FTS result with GIN index |
Date: | 2010-07-29 11:03:32 |
Message-ID: | Pine.LNX.4.64.1007291459270.32129@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Tom,
we're not able to work on this right now, so go ahead if you have time.
I also wonder why did I get "right" result :) Just repeated the query:
test=# select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));
count
-------
123
(1 row)
Time: 26.185 ms
Oleg
On Wed, 28 Jul 2010, Tom Lane wrote:
> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
>> you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump
>
> Hmm ... I'm not sure why you're failing to reproduce it, because it's
> falling over pretty easily for me. After poking at it for awhile,
> I am of the opinion that scanGetItem's handling of multiple keys is
> fundamentally broken and needs to be rewritten completely. The
> particular case I'm seeing here is that one key returns this sequence of
> TIDs/lossy flags:
>
> ...
> 1085/4 0
> 1086/65535 1
> 1087/4 0
> ...
>
> while the other one returns this:
>
> ...
> 1083/11 0
> 1086/6 0
> 1086/10 0
> 1087/10 0
> ...
>
> and what comes out of scanGetItem is just
>
> ...
> 1086/6 1
> ...
>
> because after returning that, on the next call it advances both input
> keystreams. So 1086/10 should be visited and is not.
>
> I think that depending on the previous entryRes state to determine what
> to do is basically unworkable, and what should probably be done instead
> is to remember the last-returned TID and advance keystreams with TIDs <=
> that. I haven't quite thought through how that should interact with
> lossy-page TIDs but it seems more robust than what we've got.
>
> I'm also noticing that the ANDing behavior for the "ee:* & dd:*" query
> style seems very much stupider than it needs to be --- it's returning
> lossy pages that very obviously don't need to be examined because the
> other keystream has no match at all on that page. But I haven't had
> time to probe into the reason why.
>
> I'm out of time for today, do you want to work on it?
>
> regards, tom lane
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Ned Lilly | 2010-07-29 12:10:12 | Re: Which CMS/Ecommerce/Shopping cart ? |
Previous Message | Oleg Bartunov | 2010-07-29 10:57:30 | Re: Need help with full text index configuration |
From | Date | Subject | |
---|---|---|---|
Next Message | Boszormenyi Zoltan | 2010-07-29 11:55:38 | Re: lock_timeout GUC patch - Review |
Previous Message | Henk Enting | 2010-07-29 10:57:19 | patch for check constraints using multiple inheritance |