From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Minimal logical decoding on standbys |
Date: | 2019-04-05 11:38:39 |
Message-ID: | CAJ3gD9cFg-FgG5P=p7yym2RMQ0cz9bKkOQ+Mp9p6vxm2se1=FA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 3 Apr 2019 at 19:57, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> Oops, it was my own change that caused the hang. Sorry for the noise.
> After using wal_debug, found out that after replaying the LOCK records
> for the catalog pg_auth, it was not releasing it because it had
> actually got stuck in ReplicationSlotDropPtr() itself. In
> ResolveRecoveryConflictWithSlots(), a shared
> ReplicationSlotControlLock was already held before iterating through
> the slots, and now ReplicationSlotDropPtr() again tries to take the
> same lock in exclusive mode for setting slot->in_use, leading to a
> deadlock. I fixed that by releasing the shared lock before calling
> ReplicationSlotDropPtr(), and then re-starting the slots' scan over
> again since we released it. We do similar thing for
> ReplicationSlotCleanup().
>
> Attached is a rebased version of your patch
> logical-decoding-on-standby.patch. This v2 version also has the above
> changes. It also includes the tap test file which is still in WIP
> state, mainly because I have yet to add the conflict recovery handling
> scenarios.
Attached v3 patch includes a new scenario to test conflict recovery
handling by verifying that the conflicting slot gets dropped.
WIth this, I am done with the test changes, except the below question
that I had posted earlier which I would like to have inputs :
Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :
ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584
I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachment | Content-Type | Size |
---|---|---|
logical-decoding-on-standby_v3.patch | application/octet-stream | 39.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2019-04-05 11:42:33 | Re: FETCH FIRST clause PERCENT option |
Previous Message | Peter Eisentraut | 2019-04-05 11:23:35 | Re: log bind parameter values on error |