From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Fixing hash index build time |
Date: | 2007-03-21 21:25:36 |
Message-ID: | 200703212125.l2LLPaA02230@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Added to TODO:
o During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
---------------------------------------------------------------------------
Tom Lane wrote:
> I wrote:
> > I'm not sure if this has been discussed before, but I suddenly realized
> > while responding to the above message that the reason for the awful
> > performance is pretty obvious: hashbuild starts with a minimum-size
> > index (two buckets) and repeatedly splits buckets as insertions are
> > done, exactly the same as ordinary dynamic growth of the index would do.
> > This means that for an N-row table, approximately N/entries-per-bucket
> > splits need to occur during index build, which results in roughly O(N^2)
> > performance because we have to reprocess already-inserted entries over
> > and over.
>
> Well, unfortunately this theory seems to be all wet. Given that the
> bucket loading is reasonably even, the time to split a bucket is about
> constant and so there's no O(N^2) effect. (The multiplier hidden inside
> O(N) is pretty awful, but it doesn't change with N.)
>
> The real reason why performance falls off a cliff for building large
> hash indexes seems to be much harder to fix: basically, once the size
> of your index exceeds working memory, it's nap time. Given that the
> incoming data has randomly distributed hash values, each bucket is about
> as likely to be touched next as any other; there is no locality of
> access and so the "working set" is the same size as the index. Once it
> doesn't fit in RAM anymore you're into swap hell.
>
> The only way I can see to fix that is to try to impose some locality of
> access during the index build. This is not impossible: for example,
> given a choice for the number of buckets, we could sort all the index
> tuples by hashed bucket number and then start inserting. btree does a
> preliminary sort, and its index build times are way more reasonable
> than hash's currently are, so the cost of the sort isn't outrageous.
> (I note this is mainly because we know how to do sorting with locality
> of access...) Before we start inserting we will know exactly how many
> tuples there are, so we can pre-create the right number of buckets and
> be sure that no on-the-fly splits will be needed for the rest of the
> build. If we guessed wrong about the number of buckets there will be
> some places in the process where we concurrently insert into several
> buckets not just one, or perhaps come back to a bucket that we touched
> earlier, but that's still maintaining plenty of locality of access.
>
> This is looking like more work than I want to do in the near future,
> but I thought I'd put it into the archives for someone to tackle.
> Bruce, would you add a TODO item linking to this:
>
> * Improve hash index build time by sorting
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-03-21 21:32:21 | libpq cursor support? |
Previous Message | Bruce Momjian | 2007-03-21 21:21:06 | Re: Question about the TODO, numerics, and division |