From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN index build speed |
Date: | 2008-12-02 12:37:44 |
Message-ID: | 49352C18.30702@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> The issue is that the GIN index build code accumulates the lexemes into
> a binary tree, but there's nothing to keep the tree balanced. My test
> case with almost monotonically increasing keys, happens to be a
> worst-case scenario, and the tree degenerates into almost linked list
> that every insertion has to grovel through.
Agree, but in most cases it works well. Because lexemes in documents aren't ordered.
>
> The obvious fix is to use a balanced tree algorithm. I wrote a quick
> patch to turn the tree into a splay tree. That fixed the degenerative
> behavior, and the runtime of CREATE INDEX for the above test case fell
> from 40s to 1.5s.
BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage
of btree emulation scalar column will be more or less ordered.
>
> Magnus kindly gave me a dump of the full-text-search tables from
> search.postgresql.org, for some real-world testing. Quick testing with
> that suggests that the patch unfortunately makes the index build 5-10%
> slower with that data set.
Do you see ways to improve that?
>
> We're in commitfest, not supposed to be submitting new features, so I'm
> not going to pursue this further right now. Patch attached, however,
> which seems to work fine.
Personally, I don't object to improve that.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Hannes Eder | 2008-12-02 12:44:36 | WIP: The Skyline Operator |
Previous Message | Fujii Masao | 2008-12-02 12:37:12 | Re: Sync Rep: First Thoughts on Code |