Re: pg_replication_origin_drop API potential race condition

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_replication_origin_drop API potential race condition
Date: 2021-02-05 07:01:53
Message-ID: CAA4eK1+D=M2TZUFd06rW1HSChpV6Qim6d_2+-7arC-ewMb3HAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 5, 2021 at 9:46 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> PSA patch updated per above suggestions.
>

Thanks, I have tested your patch and before the patch, I was getting
errors like "tuple concurrently deleted" or "cache lookup failed for
replication origin with oid 1" and after the patch, I am getting
"replication origin "origin-1" does not exist" which is clearly better
and user-friendly.

Before Patch
postgres=# select pg_replication_origin_drop('origin-1');
ERROR: tuple concurrently deleted
postgres=# select pg_replication_origin_drop('origin-1');
ERROR: cache lookup failed for replication origin with oid 1

After Patch
postgres=# select pg_replication_origin_drop('origin-1');
ERROR: replication origin "origin-1" does not exist

I wonder why you haven't changed the usage of the existing
replorigin_drop in the code? I have changed the same, added few
comments, ran pgindent, and updated the commit message in the
attached.

I am not completely whether we should retire replorigin_drop or just
keep it for backward compatibility? What do you think? Anybody else
has any opinion?

For others, the purpose of this patch is to "make
pg_replication_origin_drop safe against concurrent drops.". Currently,
we get the origin id from the name and then drop the origin by taking
ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time, and then when
they try to drop the origin, one of the sessions will get either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".

To prevent this race condition we do the entire operation under lock.
This obviates the need for replorigin_drop() API but we have kept it
for backward compatibility.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v3-0001-Make-pg_replication_origin_drop-safe-against-conc.patch application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-02-05 07:06:15 RE: Single transaction in the tablesync worker?
Previous Message Bharath Rupireddy 2021-02-05 06:38:43 Re: logical replication worker accesses catalogs in error context callback