Re: Single transaction in the tablesync worker?

From: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-06 09:41:17
Message-ID: 03e167c7-ccdd-e036-90ab-8231e9f4c944@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06/02/2021 06:07, Amit Kapila wrote:
> On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>> On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
>> <petr(dot)jelinek(at)enterprisedb(dot)com> wrote:
>>>> +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
>>>> +{
>>>> + if (syncslotname)
>>>> + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
>>>> + else
>>>> + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
>>>> +
>>>> + return syncslotname;
>>>> +}
>>> Given that we are now explicitly dropping slots, what happens here if we
>>> have 2 different downstreams that happen to get same suboid and reloid,
>>> will one of the drop the slot of the other one? Previously with the
>>> cleanup being left to temp slot we'd at maximum got error when creating
>>> it but with the new logic in LogicalRepSyncTableStart it feels like we
>>> could get into situation where 2 downstreams are fighting over slot no?
>>>
> I think so. See, if the alternative suggested below works or if you
> have any other suggestions for the same?
>
>> The PG docs [1] says "there is only one copy of pg_subscription per
>> cluster, not one per database". IIUC that means it is not possible for
>> 2 different subscriptions to have the same suboid.
>>
> I think he is talking about two different clusters having separate
> subscriptions but point to the same publisher. In different clusters,
> we can get the same subid/relid. I think we need a cluster-wide unique
> identifier to distinguish among different subscribers. How about using
> the system_identifier stored in the control file (we can use
> GetSystemIdentifier to retrieve it). I think one concern could be
> that adding that to slot name could exceed the max length of slot
> (NAMEDATALEN -1) but I don't think that is the case here
> (pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last
> is system_identifier in this scheme.

Yep that's what I mean and system_identifier seems like a good choice to me.

> Do you guys think that works or let me know if you have any other
> better idea? Petr, is there a reason why such an identifier is not
> considered originally, is there any risk in it?

Originally it was not considered likely because it's all based on
pglogical/BDR work where ids are hashes of stuff that's unique across
group of instances, not counter based like Oids in PostgreSQL and I
simply didn't realize it could be a problem until reading this patch :)

--
Petr Jelinek

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2021-02-06 09:56:01 Re: pg_replication_origin_drop API potential race condition
Previous Message osumi.takamichi@fujitsu.com 2021-02-06 07:30:40 RE: Single transaction in the tablesync worker?