Re: GIN index build speed

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN index build speed
Date: 2008-12-02 12:37:44
Message-ID: 49352C18.30702@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The issue is that the GIN index build code accumulates the lexemes into
> a binary tree, but there's nothing to keep the tree balanced. My test
> case with almost monotonically increasing keys, happens to be a
> worst-case scenario, and the tree degenerates into almost linked list
> that every insertion has to grovel through.
Agree, but in most cases it works well. Because lexemes in documents aren't ordered.

>
> The obvious fix is to use a balanced tree algorithm. I wrote a quick
> patch to turn the tree into a splay tree. That fixed the degenerative
> behavior, and the runtime of CREATE INDEX for the above test case fell
> from 40s to 1.5s.
BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage
of btree emulation scalar column will be more or less ordered.

>
> Magnus kindly gave me a dump of the full-text-search tables from
> search.postgresql.org, for some real-world testing. Quick testing with
> that suggests that the patch unfortunately makes the index build 5-10%
> slower with that data set.
Do you see ways to improve that?
>
> We're in commitfest, not supposed to be submitting new features, so I'm
> not going to pursue this further right now. Patch attached, however,
> which seems to work fine.
Personally, I don't object to improve that.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannes Eder 2008-12-02 12:44:36 WIP: The Skyline Operator
Previous Message Fujii Masao 2008-12-02 12:37:12 Re: Sync Rep: First Thoughts on Code