From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Limitation on number of positions (tsearch) |
Date: | 2007-09-13 11:59:30 |
Message-ID: | 46E92622.5030601@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Why is there a limitation of 256 positions per lexeme in a tsvector?
> There doesn't seem to be a technical reason for that. WordEntryPosVector
> uses a uint16 to store the number of positions, so it go up to 65535.
For two reasons:
- Ranking might become very slow if number of position is big
- From practice: if word is very frequent on document then with high probability
this is a stop word or (case of internet-wide search engines) document is a spam.
That's common practice of search engines to limit number of word's positions,
because increasing it doesn't give advantage in term of ranking
and cause trouble from increasing of storage size.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2007-09-13 12:23:29 | Re: Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20 |
Previous Message | Heikki Linnakangas | 2007-09-13 11:09:05 | Limitation on number of positions (tsearch) |