From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Block level concurrency during recovery |
Date: | 2008-10-23 08:57:34 |
Message-ID: | 1224752254.27145.608.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2008-10-23 at 09:09 +0300, Heikki Linnakangas wrote:
> However, we require that in b-tree vacuum, you take a cleanup lock on
> *every* leaf page of the index, not only those that you modify. That's a
> problem, because there's no trace of such pages in the WAL.
OK, good. Thanks for the second opinion. I'm glad you said that, cos I
felt sure anybody reading the patch would say "what the hell does this
bit do?". Now I can add it.
My solution is fairly simple:
As we pass through the table we keep track of which blocks need
visiting, then append that information onto the next WAL record. If the
last block doesn't contain removed rows, then we send a no-op message
saying which blocks to visit.
I'd already invented the XLOG_BTREE_VACUUM record, so now we just need
to augment it further with two fields: ordered array of blocks to visit,
and a doit flag.
Say we have a 10 block table, with rows to be removed on blocks 3,4,8.
As we visit all 10 in sequence we would issue WAL records:
XLOG_BTREE_VACUUM block 3 visitFirst {1, 2} doit = true
XLOG_BTREE_VACUUM block 4 visitFirst {} doit = true
XLOG_BTREE_VACUUM block 8 visitFirst {5,6,7} doit = true
XLOG_BTREE_VACUUM block 10 visitFirst {9} doit = false
So that allows us to issue the same number of WAL messages yet include
all the required information to repeat the process correctly.
(The blocks can be visited out of sequence in some cases, hence the
ordered array of blocks to visit rather than just a first block value).
It would also be possible to introduce a special tweak there which is
that if the block is not in cache, don't read it in at all. If its not
in cache we know that nobody has a pin on it, so don't need to read it
in just to say "got the lock". That icing for later.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2008-10-23 10:40:01 | Re: Deriving Recovery Snapshots |
Previous Message | Simon Riggs | 2008-10-23 08:42:14 | Re: Deriving Recovery Snapshots |