From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | Patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: Partial match in GIN |
Date: | 2008-04-10 11:53:32 |
Message-ID: | 47FDFFBC.4060302@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
> How about forcing the use of a bitmap index scan, and modify the indexam
> API so that GIN could a return a lossy bitmap, and let the bitmap heap
> scan do the rechecking?
Partial match might be used only for one search entry from many. In sext search
example: 'a:* & qwertyuiop' - second lexeme has only a few matched tuples. But
GIN itself doesn't know about semantic meaning of operation and can not
distinguish following tsqueries:
'!a:* & qwertyuiop'
'!a:* & qwertyuiop'
'a:* & !qwertyuiop'
So, your suggestion is equivalent to mark all operation with RECHEK flag and
OR-ing all posting lists. That will be give a lot of false match and too slow.
>
>>> I don't think the storage size of tsquery matters much, so whatever
>>> is the best solution in terms of code readability etc.
>> That was about tsqueryesend/recv format? not a storage on disk. We
>> don't require compatibility of binary format of db's files, but I have
>> some doubts about binary dump.
>
> We generally don't make any promises about cross-version compatibility
> of binary dumps, though it would be nice not to break it if it's not too
> much effort.
>
>>> Hmm. match_special_index_operator() already checks that the index's
>>> opfamily is pattern_ops, or text_ops with C-locale. Are you reusing
>>> the same operator families for wildspeed? Doesn't it then also get
>>> confused if you do a "WHERE textcol > 'foo'" query by hand?
>> No, wildspeed use the same operator ~~
>> match_special_index_operator() isn't called at all: in
>> match_clause_to_indexcol() function is_indexable_operator() is called
>> before match_special_index_operator() and returns true.
>>
>> expand_indexqual_opclause() sees that operation is a OID_TEXT_LIKE_OP
>> and calls prefix_quals() which fails because it wishes only several
>> Btree opfamilies.
>
> Oh, I see. So this assumption mentioned in the comment there:
>
> /*
> * LIKE and regex operators are not members of any index opfamily,
> * so if we find one in an indexqual list we can assume that it
> * was accepted by match_special_index_operator().
> */
>
> is no longer true with wildspeed. So we do need to check that in
> expand_indexqual_opclause() then.
>
>>>> NOTICE 2: it seems to me, that similar technique could be
>>>> implemented for ordinary BTree to eliminate hack around LIKE support.
>>> LIKE expression. I wonder what the size and performance of that would
>>> be like, in comparison to the proposed GIN solution?
>>
>> GIN speeds up '%foo%' too - which is impossible for btree. But I don't
>> like a hack around LIKE support in BTree. This support uses outflank
>> ways missing regular one.
>
> You could satisfy '%foo%' using a regular and a reverse B-tree index,
> and a bitmap AND. Which is interestingly similar to the way you proposed
> to use a TIDBitmap within GIN.
>
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Chernow | 2008-04-10 12:47:26 | Re: [PATCHES] libpq type system 0.9a |
Previous Message | Gregory Stark | 2008-04-10 09:48:08 | Re: EXPLAIN progress info |