Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-19 06:40:30
Message-ID: CAA4eK1+-Qgq1SrsMz8vufd2-yOVuj0H-PaTYTxe-e6krY702kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> TODO / Known Issues:
>
> * the current implementation of tablesync drop slot (e.g. from
> DropSubscription or finish_sync_worker) regenerates the tablesync slot
> name so it knows what slot to drop.
>

If you always drop the slot at finish_sync_worker, then in which case
do you need to drop it during DropSubscription? Is it when the table
sync workers are crashed?

> The current code might be ok for
> normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
> (slot_name = newname) it would fail to be able to find the tablesync
> slot.
>

Sure, but the same will be true for the apply worker slot as well. I
agree the problem would be more for table sync workers but I think we
can solve it, see below.

> * I think if there are crashed tablesync workers then they are not
> known to DropSubscription. So this might be a problem to cleanup slots
> and/or origin tracking belonging to those unknown workers.
>

Yeah, I think we can do two things to avoid this and the previous
problem. (a) We can generate the slot_name for the table sync worker
based on only subscription_id and rel_id. (b) Immediately after
creating the slot, advance the replication origin with the position
(origin_startpos) we get from walrcv_create_slot, this will help us to
start from the right location.

Do you see anything which will still not be addressed after doing the above?

I understand why you are trying to create this patch atop logical
decoding of 2PC patch but I think it is better to create this as an
independent patch and then use it to test 2PC problem. Also, please
explain what kind of testing you did to ensure that it works properly
after the table sync worker restarts after the crash.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-19 06:42:16 Re: Misleading comment in prologue of ReorderBufferQueueMessage
Previous Message Michael Paquier 2020-12-19 06:13:50 Re: Incorrect allocation handling for cryptohash functions with OpenSSL