From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit |
Date: | 2008-03-07 13:56:40 |
Message-ID: | 47D14998.3080304@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-patches |
To be precise about tsvector:
1) GiST index is lossy for any kind of tserach queries, GIN index for @@
operation is not lossy, for @@@ - is lossy.
2) Number of positions per word is limited to 256 number - bigger number of
positions is not helpful for ranking, but produces a big tsvector. If word has a
lot of positions in document then it close to be a stopword. We could easy
increase this limit to 65536 positions
3) Maximum value of position is 2^14, because for position's storage we use
uint16. In this integer it's needed to reserve 2 bits to store weight of this
position. It's possible to increase int16 to int32, but it will doubled tsvector
size, which is unpractical, I suppose. So, part of document used for ranking
contains first 16384 words - that is about first 50-100 kilobytes.
4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It
contains number of bytes between first lexeme in tsvector and needed lexeme.
So, limitation is total length of lexemes plus theirs positional information.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-03-07 13:58:08 | Re: BUG #4019: Comparison of user defined domain doesn't work |
Previous Message | Bruce Momjian | 2008-03-07 13:22:56 | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2008-03-07 18:29:14 | Re: Minimum selectivity estimate for LIKE 'prefix%' |
Previous Message | Bruce Momjian | 2008-03-07 13:22:56 | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit |