From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Hannu Krosing <hannu(at)tm(dot)ee> |
Cc: | Urmo <urmo(at)xwm(dot)ee>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Searching for substring with tsearch(1/2) |
Date: | 2003-12-10 10:06:11 |
Message-ID: | 3FD6F013.6050906@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I meant that the expansion of 'hu%' is done before and outside of
> tsearch, so the question is how efficient will tsearch be for searching
> for hudreds or thousands of words in one expression.
Ok, I see. The answer - bad. Index structure is signature tree with constant
signature length, by default 2016 bits. Siganture makes by hashing word and sets
bits number HASHVAL % 2016 to 1. So, if query has many terms and all terms are
ored then there is a lot of signatures that matched by query. This means a lot
of pages in index will be readed.
>>>How hard (or sensible ;) would be creating such an index using GiST ?
>>>As proved by tsearch GiST can cope well with many-to-many indexes.
>>
>>Sorry, I don't understand. Do you mean that GiST supports one heap tuple in
>>several index tuple? If yes then no :). GiST doesn't support this feature. I
>>don't think that GiST may help in this situation.
>
>
> but tsearch seems to support this, and tsearch uses GiST. Is this
> functionality added entirely by tsearch ?
No, one heap tuple - one index tuple.
I'll try to explain index structure used by tsearch (three levels just for example):
Root page
internal tuple 1 -> second level page 1
internal tuple 1.1 ->
internal tuple 1.2 ->
internal tuple 2 -> second level page 2
internal tuple 2.1 ->
internal tuple 2.2 -> third level (leaf) page 2.2
leaf tuple 2.2.1 -> heap tuple
leaf tuple 2.2.2 -> heap tuple
leaf tuple contains one of two types of predicats:
1 just lexemes (without psition information)
2 if store size of first type is too big then tuple
stores signature as described above.
internal tuple contains ored (super-imposed) signatures of childs.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Pflug | 2003-12-10 11:21:15 | Re: Cannot add an column of type serial or bigserial |
Previous Message | Hannu Krosing | 2003-12-10 09:34:23 | Re: Searching for substring with tsearch(1/2) |