From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Hannu Krosing <hannu(at)tm(dot)ee> |
Cc: | Urmo <urmo(at)xwm(dot)ee>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Searching for substring with tsearch(1/2) |
Date: | 2003-12-10 09:20:14 |
Message-ID: | 3FD6E54E.7080109@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>Tsearch was never minded as prefix search, and index structure doesn't support
>>any kind of prefix or suffix. But you can write extension to tsearch, which will
>>search by prefix. But such solution wiil not use index, only sequence scan.
>
>
> How efficient would tsearch be for really big expressions (where 'hu%'
> would be expanded (using a btree word index on one column word table) to
> tsearch equivalent of ( "human" or "humanity" or "humming" or "huge" or
> ..1000 words here...) before passing the expression to tsearch?
GiST index of tsearch doen't support prefix search, so it will works only by
seqscan, as we know :) disk is much more slow than processor, speed will be
limited by disk.
>>Prefix searches easy realized with inverted index, but it require a lot of
>>programing.
>>The simplest way is:
>>create table invidx (
>> lexeme text not null primary key,
>> ids[] int
>>);
>>
>>where ids[] - array with identificators of documents which contains this word.
>
>
> How hard (or sensible ;) would be creating such an index using GiST ?
> As proved by tsearch GiST can cope well with many-to-many indexes.
Sorry, I don't understand. Do you mean that GiST supports one heap tuple in
several index tuple? If yes then no :). GiST doesn't support this feature. I
don't think that GiST may help in this situation.
> create table invidx (
> lexeme text not null,
> textdate date not null,
> ids[] int,
> primary key (lexeme, textdate)
> );
>
> which would partition the invidx table on textdate (or some other
> suitable datum)
>
>
>>2 If word is frequent then query with 'IN (select * from func()) may works slow...
> if it is often too slow then creating a temp table and doing a plain
> join may be faster.
Table structure as indidx decrease this problem.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2003-12-10 09:34:23 | Re: Searching for substring with tsearch(1/2) |
Previous Message | Ian Freislich | 2003-12-10 07:58:56 | Re: Cannot add an column of type serial or bigserial |