From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fast GiST index build - further improvements |
Date: | 2012-02-02 19:54:37 |
Message-ID: | CAPpHfduVKZMo05Zn+xSMCHbkfqd2gtj2k9vGCTjnG+RvwsE5Qw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 8, 2011 at 8:35 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Now that the main GiST index build patch has been committed, there's a few
> further improvements that could make it much faster still:
>
> Better management of the buffer pages on disk. At the moment, the
> temporary file is used as a heap of pages belonging to all the buffers in
> random order. I think we could speed up the algorithm considerably by
> reading/writing the buffer pages sequentially. For example, when an
> internal page is split, and all the tuples in its buffer are relocated,
> that would be a great chance to write the new pages of the new buffers in
> sequential order, instead of writing them back to the pages freed up by the
> original buffer, which can be spread all around the temp file. I wonder if
> we could use a separate file for each buffer? Or at least, a separate file
> for all buffers that are larger than, say 100 MB in size.
>
> Better management of in-memory buffer pages. When we start emptying a
> buffer, we currently flush all the buffer pages in memory to the temporary
> file, to make room for new buffer pages. But that's a waste of time, if
> some of the pages we had in memory belonged to the buffer we're about to
> empty next, or that we empty tuples to. Also, if while emptying a buffer,
> all the tuples go to just one or two lower level buffers, it would be
> beneficial to keep more than one page in-memory for those buffers.
>
> Buffering leaf pages. This I posted on a separate thread already:
> http://archives.postgresql.**org/message-id/4E5350DB.**
> 3060209(at)enterprisedb(dot)com<http://archives.postgresql.org/message-id/4E5350DB.3060209@enterprisedb.com>
>
>
> Also, at the moment there is one issue with the algorithm that we have
> glossed over this far: For each buffer, we keep some information in memory,
> in the hash table, and in the auxiliary lists. That means that the amount
> of memory needed for the build scales with the size of the index. If you're
> dealing with very large indexes, hopefully you also have a lot of RAM in
> your system, so I don't think this is a problem in practice. Still, it
> would be nice to do something about that. A straightforward idea would be
> to swap some of the information to disk. Another idea that, simpler to
> implement, would be to completely destroy a buffer, freeing all the memory
> it uses, when it becomes completely empty. Then, if you're about to run out
> of memory (as defined by maintenance_work_mem), you can empty some low
> level buffers to disk to free up some.
Unfortunately, I hadn't enough of time to implement something of this
before 9.2 release. Work on my Phd. thesis and web-projects takes too much
time.
But, I think there is one thing we should fix before 9.2 release. We assume
that gist index build have at least effective_cache_size/4 of cache. This
assumption could easily be false on high concurrency systems. I don't see
the way for convincing estimate here, but we could document this behaviour.
So, users could just tune effective_cache_size for gist index build on high
concurrency.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Bernd Helmle | 2012-02-02 21:08:35 | Re: pg_stats_recovery view |
Previous Message | Magnus Hagander | 2012-02-02 19:14:12 | Re: keywords in pg_hba.conf are case-sensitive? |