| From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
|---|---|
| To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: GIN index build speed |
| Date: | 2008-12-02 12:37:44 |
| Message-ID: | 49352C18.30702@sigaev.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> The issue is that the GIN index build code accumulates the lexemes into
> a binary tree, but there's nothing to keep the tree balanced. My test
> case with almost monotonically increasing keys, happens to be a
> worst-case scenario, and the tree degenerates into almost linked list
> that every insertion has to grovel through.
Agree, but in most cases it works well. Because lexemes in documents aren't ordered.
>
> The obvious fix is to use a balanced tree algorithm. I wrote a quick
> patch to turn the tree into a splay tree. That fixed the degenerative
> behavior, and the runtime of CREATE INDEX for the above test case fell
> from 40s to 1.5s.
BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage
of btree emulation scalar column will be more or less ordered.
>
> Magnus kindly gave me a dump of the full-text-search tables from
> search.postgresql.org, for some real-world testing. Quick testing with
> that suggests that the patch unfortunately makes the index build 5-10%
> slower with that data set.
Do you see ways to improve that?
>
> We're in commitfest, not supposed to be submitting new features, so I'm
> not going to pursue this further right now. Patch attached, however,
> which seems to work fine.
Personally, I don't object to improve that.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hannes Eder | 2008-12-02 12:44:36 | WIP: The Skyline Operator |
| Previous Message | Fujii Masao | 2008-12-02 12:37:12 | Re: Sync Rep: First Thoughts on Code |