Quick Links

Re: [HACKERS] Index greater than 8k

From:	"Gregory Maxwell" <gmaxwell(at)gmail(dot)com>
To:	"Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Darcy Buskermolen" <darcyb(at)commandprompt(dot)com>, "PgSQL General" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Index greater than 8k
Date:	2006-11-01 20:46:51
Message-ID:	e692861c0611011246g6be68bf5x27f46ade534e2da@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On 11/1/06, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
[snip]
> Brain storm method:
>
> Develop a dictionary which returns all substring for lexeme, for example for
> word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob
> bar ba ar'. And make GIN functional index over your column (to save disk space).
[snip]
> Time of search in GIN weak depend on number of words (opposite to
> tsearch2/GiST), but insertion of row may be slow enough....

With the right folding the number of possible trigrams for ascii text
is fairly small.. much smaller than the number of words in used in a
large corpus of text so the GIN performance for searches should be
pretty good.

Real magic would be to teach the regex operator to transparently make
use of such an index. ;)

In response to

Re: [HACKERS] Index greater than 8k at 2006-11-01 13:26:36 from Teodor Sigaev

Browse pgsql-general by date

	From	Date	Subject
Next Message	pgsql-general@list.coretech.ro	2006-11-01 21:15:00	time value '24:00:00'
Previous Message	Martijn van Oosterhout	2006-11-01 19:50:30	Re: Encoding, Unicode, locales, etc.

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2006-11-01 21:06:34	Re: IN(subselect returning few values ...)
Previous Message	luis garcia	2006-11-01 20:40:52	Re: ¿¿¿past chunk end???