Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-01-09 06:46:43
Message-ID: CAA4eK1+gUBxKcYWg+MCC6Qbw-My+2wKUct+iFtr-_HgundUUBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2021 at 2:55 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> >
>
> > 3.
> > + /*
> > + * To build a slot name for the sync work, we are limited to NAMEDATALEN -
> > + * 1 characters.
> > + *
> > + * The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's
> > + * actually the NAMEDATALEN on the remote that matters, but this scheme
> > + * will also work reasonably if that is different.)
> > + */
> > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */
> > +
> > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
> >
> > The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
> > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1.
> > Should we change the comment here?
> >
>
> The comment wording is a remnant from older code which had a
> differently format slot name.
> I think the comment is still valid, albeit maybe unnecessary since in
> the current code the tablesync slot
> name length is fixed. But I left the older comment here as a safety reminder
> in case some future change would want to modify the slot name. What do
> you think?
>

I find it quite confusing. The comments should reflect the latest
code. You can probably say in some form that the length of slotname
shouldn't exceed NAMEDATALEN because of remote node constraints on
slot name length. Also, probably the StaticAssert on NAMEDATALEN is
not required.

1.
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally and dropped automatically when they are no longer needed.
+ These table synchronization slots have generated names:
+ <quote><literal>pg_%u_sync_%u</literal></quote> (parameters:
Subscription <parameter>oid</parameter>, Table
<parameter>relid</parameter>)
+ </para>

The last line seems too long. I think we are not strict for 80 char
limit in docs but it is good to be close to that, however, this
appears quite long.

2.
+ /*
+ * Cleanup any remaining tablesync resources.
+ */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+ char state;
+ XLogRecPtr statelsn;

I have already mentioned previously that let's not use this new style
of code (start using { to localize the scope of variables). I don't
know about others but I find it difficult to read such a code. You
might want to consider moving this whole block to a separate function.

3.
/*
+ * XXX - Should optimize this to avoid multiple
+ * connect/disconnect.
+ */
+ wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);

I think it is better to avoid multiple connect/disconnect here. In
this same function, we have connected to the publisher, we should be
able to use the same connection.

4.
process_syncing_tables_for_sync()
{
..
+ /*
+ * Cleanup the tablesync slot.
+ */
+ syncslotname = ReplicationSlotNameForTablesync(
+ MySubscription->oid,
+ MyLogicalRepWorker->relid);
+ PG_TRY();
+ {
+ elog(DEBUG1, "process_syncing_tables_for_sync: dropping the
tablesync slot \"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ }
+ PG_FINALLY();
+ {
+ pfree(syncslotname);
+ }
+ PG_END_TRY();
..
}

Both here and in DropSubscription(), it seems we are using
PG_TRY..PG_FINALLY just to free the memory even though
ReplicationSlotDropAtPubNode already has try..finally. Can we arrange
code to move allocation of syncslotname inside
ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if
the usage of try..finally here is only to free the memory, I am not
sure if it is required because I think we will anyway Reset the memory
context where this memory is allocated as part of error handling.

5.
#define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn
* NULL) */
+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+ * (sublsn NULL) */
#define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of
* apply (sublsn set) */

I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as
it is quite different from other adjoining state names and somehow not
going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or
SUBREL_STATE_FINISHEDCOPY 'f'?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-09 07:27:24 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Bharath Rupireddy 2021-01-09 05:57:46 Re: logical replication worker accesses catalogs in error context callback