Re: Partial match in GIN

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Partial match in GIN
Date: 2008-04-08 10:54:09
Message-ID: 47FB4ED1.7060609@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

> Looking at the patch, you require that the TIDBitmap fits in work_mem in
> non-lossy format. I don't think that's acceptable, it can easily exceed
> work_mem if you search for some very common word. Failing to execute a
> valid query is not good.
But way is better than nothing. In really, that way was chosen to have fast
merge of (potentially) hundreds of sorted lists of ItemPointers. Other ways is
much slower.

Some calculations: with 8Mb of mem_work TIDBimap in non-lossy mode can store at
least 200000 pages, which gives to us no less than 200000 tuples. For frequent
word, that number should multiplied to 10 or 100, because practically every
tuple will contain it. Practical limit to number of articles/document served by
one servers is about 10 millions.

There are no so many alternatives:
- collect all needed ItemPointers and sort then unique them.
- merge each posting list with already collected ones
- N-way merge, where N can be very big
- Rerun index scan with all possible combinations

All this ways will be much slower even for not very big collections.

> I don't think the storage size of tsquery matters much, so whatever is
> the best solution in terms of code readability etc.
That was about tsqueryesend/recv format? not a storage on disk. We don't require
compatibility of binary format of db's files, but I have some doubts about
binary dump.

>
> Hmm. match_special_index_operator() already checks that the index's
> opfamily is pattern_ops, or text_ops with C-locale. Are you reusing the
> same operator families for wildspeed? Doesn't it then also get confused
> if you do a "WHERE textcol > 'foo'" query by hand?
No, wildspeed use the same operator ~~
match_special_index_operator() isn't called at all: in
match_clause_to_indexcol() function is_indexable_operator() is called before
match_special_index_operator() and returns true.

expand_indexqual_opclause() sees that operation is a OID_TEXT_LIKE_OP and calls
prefix_quals() which fails because it wishes only several Btree opfamilies.

>
>> NOTICE 2: it seems to me, that similar technique could be implemented
>> for ordinary BTree to eliminate hack around LIKE support.
> LIKE expression. I wonder what the size and performance of that would be
> like, in comparison to the proposed GIN solution?

GIN speeds up '%foo%' too - which is impossible for btree. But I don't like a
hack around LIKE support in BTree. This support uses outflank ways missing
regular one.

I'm thinking about add new strategy to Btree and allow directly support of
prefix LIKE search. And BTree will scan index while compare method with option
returns true.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2008-04-08 12:23:13 Re: Indexam API changes
Previous Message Zoltan Boszormenyi 2008-04-08 09:09:39 Re: [HACKERS] TRUNCATE TABLE with IDENTITY