From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reducing the size of BufferTag & remodeling forks |
Date: | 2015-09-12 20:19:41 |
Message-ID: | 20150912201941.GA8311@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> Why do we have to do buffer lookups using the full buffer tag?
We don't necessarily.
> Why not just use (relNode, blockNum) and resolve hash collisions, if any?
I tried that and unfortunately it came out as a negative - the number of
collision gets higher and collisions in a hashtable will often imply a
cache miss. You can even see the comparison function rising in the
profile.
I've actually changed course a bit and I'm trying something different: A
two level structure. One hashtable that maps (RelFileNode, ForkNumber)
to a 'open relation' data structure, and from there a radix tree over
just the block number. To avoid having to look up in the hashtable
frequently there's a pointer in RelationData to a fork's radix tree.
This seems to have several advantages:
* It doesn't require changes to the physical representation
* A radix tree allows ordered traversal, allowing for efficient
prefetching et al.
* The ability to efficiently get all pages that belong to a relation
makes dropping/truncating tables radically more efficient than now.
* A single lookup in the radix tree is on average significantly faster
than in the hash table. I've measured ~350 vs 60 cycles in a
nonconcurrent workload and ~820 vs 72~ cycles in a concurrent
workload, using a recorded buffer access traces.
* Having a 'open relation' representation in shared memory also makes it
much easier to implement more efficient relation extension and
truncation - we can have pointers in there that signal how far the
relation has been extended and how far it has been initialized.
The biggest disadvantage is that it's a good bit more complex than what
we have right now. That we need dynamically many radix trees makes
memory management nontrivial.
Sounds halfway sane?
Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2015-09-12 21:18:32 | Re: typo in create policy doc |
Previous Message | Pavel Stehule | 2015-09-12 18:27:52 | Re: On-demand running query plans using auto_explain and signals |