Re: [WIP] [B-Tree] Retail IndexTuple deletion

From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date: 2018-06-25 18:20:26
Message-ID: 4d398a14-f987-31c5-90a4-ff7bb21c18d9@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.06.2018 01:14, Peter Geoghegan wrote:
> On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>> According to your feedback, i develop second version of the patch.
>>> In this version:
>>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>>> page.
>>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>>> descent needed for deletion all tid's from the list with equal scan key
>>> value - logical duplicates deletion problem.
>>> 3. Only one WAL record for index tuple deletion per leaf page per
>>> amtargetdelete() call.
>>
>> Cool.
>>
>> What is this "race" code about?
>
I introduce this check because keep in mind about another vacuum
workers, which can make cleaning a relation concurrently. may be it is
redundant.

> I noticed another bug in your patch, when running a
> "wal_consistency_checking=all" smoke test. I do this simple, generic
> test for anything that touches WAL-logging, actually -- it's a good
> practice to adopt.
>
> I enable "wal_consistency_checking=all" on the installation, create a
> streaming replica with pg_basebackup (which also has
> "wal_consistency_checking=all"), and then run "make installcheck"
> against the primary. Here is what I see on the standby when I do this
> with v2 of your patch applied:
>
> 9524/2018-06-22 13:03:12 PDT LOG: entering standby mode
> 9524/2018-06-22 13:03:12 PDT LOG: consistent recovery state reached
> at 0/30000D0
> 9524/2018-06-22 13:03:12 PDT LOG: invalid record length at 0/30000D0:
> wanted 24, got 0
> 9523/2018-06-22 13:03:12 PDT LOG: database system is ready to accept
> read only connections
> 9528/2018-06-22 13:03:12 PDT LOG: started streaming WAL from primary
> at 0/3000000 on timeline 1
> 9524/2018-06-22 13:03:12 PDT LOG: redo starts at 0/30000D0
> 9524/2018-06-22 13:03:32 PDT FATAL: inconsistent page found, rel
> 1663/16384/1259, forknum 0, blkno 0
> 9524/2018-06-22 13:03:32 PDT CONTEXT: WAL redo at 0/3360B00 for
> Heap2/CLEAN: remxid 599
> 9523/2018-06-22 13:03:32 PDT LOG: startup process (PID 9524) exited
> with exit code 1
> 9523/2018-06-22 13:03:32 PDT LOG: terminating any other active server processes
> 9523/2018-06-22 13:03:32 PDT LOG: database system is shut down
>
> I haven't investigated this at all, but I assume that the problem is a
> simple oversight. The new ItemIdSetDeadRedirect() concept that you've
> introduced probably necessitates changes in both the WAL logging
> routines and the redo/recovery routines. You need to go make those
> changes. (By the way, I don't think you should be using the constant
> "3" with the ItemIdIsDeadRedirection() macro definition.)
>
> Let me know if you get stuck on this, or need more direction.
>
I was investigated the bug of the simple smoke test. You're right: make
any manipulations with line pointer in heap_page_prune() without
reflection in WAL record is no good idea.
But this consistency problem arises even on clean PostgreSQL
installation (without my patch) with ItemIdSetDead() -> ItemIdMarkDead()
replacement.
Byte-by-byte comparison of master and replay pages shows only 2 bytes
difference in the tuple storage part of page.
I don't stuck on yet, but good ideas are welcome.

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2018-06-25 19:25:43 Re: Constraint documentation
Previous Message Jeremy Finzel 2018-06-25 17:48:37 Re: Some pgq table rewrite incompatibility with logical decoding?