From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Date: | 2019-09-12 00:56:44 |
Message-ID: | CAH2-WznW9AXpf71kxcWD=JcDMWDDssBZzrwVaXXYFLMqkufn0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 11, 2019 at 3:09 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Hmm. So v12 seems to have some problems with the WAL logging for
> posting list splits. With wal_debug = on and
> wal_consistency_checking='all', I can get a replica to fail
> consistency checking very quickly when "make installcheck" is run on
> the primary
I see the bug here. The problem is that we WAL-log a version of the
new item that already has its heap TID changed. On the primary, the
call to _bt_form_newposting() has a new item with the original heap
TID, which is then rewritten before being inserted -- that's correct.
But during recovery, we *start out with* a version of the new item
that *already* had its heap TID swapped. So we have nowhere to get the
original heap TID from during recovery.
Attached patch fixes the problem in a hacky way -- it WAL-logs the
original heap TID, just in case. Obviously this fix isn't usable, but
it should make the problem clearer.
Can you come up with a proper fix, please? I can think of one way of
doing it, but I'll leave the details to you.
The same issue exists in _bt_split(), so the tests will still fail
with wal_consistency_checking -- it just takes a lot longer to reach a
point where an inconsistent page is found, because posting list splits
that occur at the same point that we need to split a page are much
rarer than posting list splits that occur when we simply need to
insert, without splitting the page. I suggest using
wal_consistency_checking to test the fix that you come up with. As I
mentioned, I regularly use it. Also note that there are further
subtleties to doing this within _bt_split() -- see the FIXME comments
there.
Thanks
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
0001-Save-original-new-heap-TID-in-insert-WAL-record.patch | application/octet-stream | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-09-12 01:33:58 | Re: Create collation reporting the ICU locale display name |
Previous Message | Tomas Vondra | 2019-09-12 00:16:34 | Re: [PATCH] Opclass parameters |