Quick Links

Re: GIN index build speed

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GIN index build speed
Date:	2008-12-02 12:37:44
Message-ID:	49352C18.30702@sigaev.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> The issue is that the GIN index build code accumulates the lexemes into
> a binary tree, but there's nothing to keep the tree balanced. My test
> case with almost monotonically increasing keys, happens to be a
> worst-case scenario, and the tree degenerates into almost linked list
> that every insertion has to grovel through.
Agree, but in most cases it works well. Because lexemes in documents aren't ordered.

>
> The obvious fix is to use a balanced tree algorithm. I wrote a quick
> patch to turn the tree into a splay tree. That fixed the degenerative
> behavior, and the runtime of CREATE INDEX for the above test case fell
> from 40s to 1.5s.
BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage
of btree emulation scalar column will be more or less ordered.

>
> Magnus kindly gave me a dump of the full-text-search tables from
> search.postgresql.org, for some real-world testing. Quick testing with
> that suggests that the patch unfortunately makes the index build 5-10%
> slower with that data set.
Do you see ways to improve that?
>
> We're in commitfest, not supposed to be submitting new features, so I'm
> not going to pursue this further right now. Patch attached, however,
> which seems to work fine.
Personally, I don't object to improve that.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

GIN index build speed at 2008-12-02 10:12:35 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hannes Eder	2008-12-02 12:44:36	WIP: The Skyline Operator
Previous Message	Fujii Masao	2008-12-02 12:37:12	Re: Sync Rep: First Thoughts on Code