Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-03-14 09:30:26
Message-ID: CAJ3gD9fwvgXO9L+gcoqj-XNuHxFR+iw10GiuoB7ytnUVWMeXeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 8 Mar 2019 at 20:59, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> >
> > On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I've a prototype attached, but let's discuss the details in a separate
> > > thread. This also needs to be changed for pluggable storage, as we don't
> > > know about table access methods in the startup process, so we can't call
> > > can't determine which AM the heap is from during
> > > btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
> >
> > Attached is a WIP test patch
> > 0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
> > modified version of Craig Ringer's test cases
>
> Hi Andres,
>
> I am trying to come up with new testcases to test the recovery
> conflict handling. Before that I have some queries :
>
> With Craig Ringer's approach, the way to reproduce the recovery
> conflict was, I believe, easy : Do a checkpoint, which will log the
> global-catalog-xmin-advance WAL record, due to which the standby -
> while replaying the message - may find out that it's a recovery
> conflict. But with your approach, the latestRemovedXid is passed only
> during specific vacuum-related WAL records, so to reproduce the
> recovery conflict error, we need to make sure some specific WAL
> records are logged, such as XLOG_BTREE_DELETE. So we need to create a
> testcase such that while creating an index tuple, it erases dead
> tuples from a page, so that it eventually calls
> _bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
> XLOG_BTREE_DELETE record.
>
> I tried to come up with this reproducible testcase without success.
> This seems difficult. Do you have an easier option ? May be we can use
> some other WAL records that may have easier more reliable test case
> for showing up recovery conflict ?
>

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.

> Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
> just throws a WARNING error level; so the wal receiver would not make
> the backends throw an error; hence the test case won't catch the
> error. Is that right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MikalaiKeida 2019-03-14 09:33:38 RE: Timeout parameters
Previous Message Heikki Linnakangas 2019-03-14 09:20:31 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons