From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bug in new buffering GiST build code |
Date: | 2012-05-27 21:46:53 |
Message-ID: | CAPpHfdvwEatx1+8ksdQey7HO=7eNxZ_TaQMJj4xaUB21p4jVWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I think we should rewrite the way we track the parents completely. Rather
> than keep a path stack attached to every node buffer, let's just maintain a
> second hash table that contains the parent of every internal node. Whenever
> a downlink is moved to another page, update the hash table with the new
> information. That way we always have up-to-date information about the
> parent of every internal node.
>
> That's much easier to understand than the path stack structures we have
> now. I think the overall memory consumption will be about the same too.
> Although we need the extra hash table with one entry for every internal
> node, we get rid of the path stack structs, which are also one per every
> internal node at the moment.
>
> I believe it is faster too. I added some instrumentation to the existing
> gist code (with the additional getNodeBuffer() call added to fix this bug),
> to measure the time spent moving right, when refinding the parent of a
> page. I added gettimeofday() calls before and after moving right, and
> summed the total. In my test case, the final index size was about 19GB, and
> the index build took 3545 seconds (59 minutes). Of that time, 580 seconds
> (~ 10 minutes) was spent moving right to refind parents. That's a lot. I
> also printed a line whenever a refind operation had to move right 20 pages
> or more. That happened 2482 times during the build, in the worst case we
> moved right over 40000 pages.
>
> Attached is a patch to replace the path stacks with a hash table. With
> this patch, the index build time in my test case dropped from 59 minutes to
> about 55 minutes. I don'ẗ know how representative or repeatable this test
> case is, but this definitely seems very worthwhile, not only because it
> fixes the bug and makes the code simpler, but also on performance grounds.
>
Cool, seems that we've both simplier and faster implementation of finding
parent. Thanks!
> Alexander, do you still have the test environments and data lying around
> that you used for GiST buffering testing last summer? Could you rerun some
> of those tests with this patch?
I think I can restore test environment and data. Will rerun tests soon.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2012-05-27 22:14:01 | Re: Synchronized scans versus relcache reinitialization |
Previous Message | Euler Taveira | 2012-05-27 19:54:24 | Re: No, pg_size_pretty(numeric) was not such a hot idea |