Re: Logical Replication of sequences

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Logical Replication of sequences
Date: 2024-07-03 14:39:24
Message-ID: CALDaNm0mSSrvHNRnC67f0HWMpoLW9UzxGVXimhwbRtKjE7Aa-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 1 Jul 2024 at 12:57, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for the patch v20240625-0002
>
> ======
> Commit Message
>
> 1.
> This commit enhances logical replication by enabling the inclusion of all
> sequences in publications. This improvement facilitates seamless
> synchronization of sequence data during operations such as
> CREATE SUBSCRIPTION, REFRESH PUBLICATION, and REFRESH PUBLICATION SEQUENCES.
>
> ~
>
> Isn't this description getting ahead of the functionality a bit? For
> example, it talks about operations like REFRESH PUBLICATION SEQUENCES
> but AFAIK that syntax does not exist just yet.
>
> ~~~
>
> 2.
> The commit message should mention that you are only introducing new
> syntax for "FOR ALL SEQUENCES" here, but syntax for "FOR SEQUENCE" is
> being deferred to some later patch. Without such a note it is not
> clear why the gram.y syntax and docs seemed only half done.
>
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 3.
> <varlistentry id="sql-createpublication-params-for-all-tables">
> <term><literal>FOR ALL TABLES</literal></term>
> + <term><literal>FOR ALL SEQUENCES</literal></term>
> <listitem>
> <para>
> - Marks the publication as one that replicates changes for all tables in
> - the database, including tables created in the future.
> + Marks the publication as one that replicates changes for all tables or
> + sequences in the database, including tables created in the future.
>
> It might be better here to keep descriptions for "ALL TABLES" and "ALL
> SEQUENCES" separated, otherwise the wording does not quite seem
> appropriate for sequences (e.g. where it says "including tables
> created in the future").
>
> ~~~
>
> NITPICK - missing spaces
> NITPICK - removed Oxford commas since previously there were none
>
> ~~~
>
> 4.
> + If <literal>FOR TABLE</literal>, <literal>FOR ALL TABLES</literal>,
> + <literal>FOR ALL SEQUENCES</literal>,or <literal>FOR TABLES IN
> SCHEMA</literal>
> + are not specified, then the publication starts out with an empty set of
> + tables. That is useful if tables or schemas are to be added later.
>
> It seems like "FOR ALL SEQUENCES" is out of place since it is jammed
> between other clauses referring to TABLES. Would it be better to
> mention SEQUENCES last in the list?
>
> ~~~
>
> 5.
> + rights on the table. The <command>FOR ALL TABLES</command>,
> + <command>FOR ALL SEQUENCES</command>, and
> <command>FOR TABLES IN SCHEMA</command> clauses require the invoking
>
> ditto of #4 above.
>
> ======
> src/backend/catalog/pg_publication.c
>
> GetAllSequencesPublicationRelations:
>
> NITPICK - typo /relation/relations/
>
> ======
> src/backend/commands/publicationcmds.c
>
> 6.
> + foreach(lc, stmt->for_all_objects)
> + {
> + char *val = strVal(lfirst(lc));
> +
> + if (strcmp(val, "tables") == 0)
> + for_all_tables = true;
> + else if (strcmp(val, "sequences") == 0)
> + for_all_sequences = true;
> + }
>
> Consider the foreach_ptr macro to slightly simplify this code.
> Actually, this whole logic seems cumbersome -- can’t the parser assign
> flags automatically. Please see my more detailed comment #10 below
> about this in gram.y
>
> ~~~
>
> 7.
> /* FOR ALL TABLES requires superuser */
> - if (stmt->for_all_tables && !superuser())
> + if (for_all_tables && !superuser())
> ereport(ERROR,
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> errmsg("must be superuser to create FOR ALL TABLES publication")));
>
> + /* FOR ALL SEQUENCES requires superuser */
> + if (for_all_sequences && !superuser())
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser to create FOR ALL SEQUENCES publication")));
> +
>
> The current code is easy to read, but I wonder if it should try harder
> to share common code, or at least a common translatable message like
> "must be superuser to create %s publication".
>
> ~~~
>
> 8.
> - else
> +
> + /*
> + * If the publication might have either tables or sequences (directly or
> + * through a schema), process that.
> + */
> + if (!for_all_tables || !for_all_sequences)
>
> I did not understand why this code cannot just say "else" like before,
> because the direct or through-schema syntax cannot be specified at the
> same time as "FOR ALL ...", so why is the more complicated condition
> necessary? Also, the similar code in AlterPublicationOptions() was not
> changed to be like this.
>
> ======
> src/backend/parser/gram.y
>
> 9. comment
>
> *
> * CREATE PUBLICATION FOR ALL TABLES [WITH options]
> *
> + * CREATE PUBLICATION FOR ALL SEQUENCES [WITH options]
> + *
> * CREATE PUBLICATION FOR pub_obj [, ...] [WITH options]
>
> The comment is not quite correct because actually you are allowing
> simultaneous FOR ALL TABLES, SEQUENCES. It should be more like:
>
> CREATE PUBLICATION FOR ALL pub_obj_type [,...] [WITH options]
>
> pub_obj_type is one of:
> TABLES
> SEQUENCES
>
> ~~~
>
> 10.
> +pub_obj_type: TABLES
> + { $$ = (Node *) makeString("tables"); }
> + | SEQUENCES
> + { $$ = (Node *) makeString("sequences"); }
> + ;
> +
> +pub_obj_type_list: pub_obj_type
> + { $$ = list_make1($1); }
> + | pub_obj_type_list ',' pub_obj_type
> + { $$ = lappend($1, $3); }
> + ;
>
> IIUC the only thing you need is a flag to say if FOR ALL TABLE is in
> effect and another flag to say if FOR ALL SEQUENCES is in effect. So,
> It seemed clunky to build up a temporary list of "tables" or
> "sequences" strings here, which is subsequently scanned by
> CreatePublication to be turned back into booleans.
>
> Can't we just change the CreatePublicationStmt field to have:
>
> A) a 'for_all_types' bitmask instead of a list:
> 0x0000 means FOR ALL is not specified
> 0x0001 means ALL TABLES
> 0x0010 means ALL SEQUENCES
>
> Or, B) have 2 boolean fields ('for_all_tables' and 'for_all_sequences')
>
> ...where the gram.y code can be written to assign the flag/s values directly?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 11.
> if (pubinfo->puballtables)
> appendPQExpBufferStr(query, " FOR ALL TABLES");
>
> + if (pubinfo->puballsequences)
> + appendPQExpBufferStr(query, " FOR ALL SEQUENCES");
> +
>
> Hmm. Is that correct? It looks like a possible bug, because if both
> flags are true it will give invalid syntax like "FOR ALL TABLES FOR
> ALL SEQUENCES" instead of "FOR ALL TABLES, SEQUENCES"
>
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
>
> 12.
> This could also try the test scenario of both FOR ALL being
> simultaneously set ("FOR ALL TABLES, SEQUENCES") to check for bugs
> like the suspected one in dump.c review comment #11 above.
>
> ======
> src/bin/psql/describe.c
>
> 13.
> + if (pset.sversion >= 170000)
> + printfPQExpBuffer(&buf,
> + "SELECT pubname AS \"%s\",\n"
> + " pg_catalog.pg_get_userbyid(pubowner) AS \"%s\",\n"
> + " puballtables AS \"%s\",\n"
> + " puballsequences AS \"%s\",\n"
> + " pubinsert AS \"%s\",\n"
> + " pubupdate AS \"%s\",\n"
> + " pubdelete AS \"%s\"",
> + gettext_noop("Name"),
> + gettext_noop("Owner"),
> + gettext_noop("All tables"),
> + gettext_noop("All sequences"),
> + gettext_noop("Inserts"),
> + gettext_noop("Updates"),
> + gettext_noop("Deletes"));
> + else
> + printfPQExpBuffer(&buf,
> + "SELECT pubname AS \"%s\",\n"
> + " pg_catalog.pg_get_userbyid(pubowner) AS \"%s\",\n"
> + " puballtables AS \"%s\",\n"
> + " pubinsert AS \"%s\",\n"
> + " pubupdate AS \"%s\",\n"
> + " pubdelete AS \"%s\"",
> + gettext_noop("Name"),
> + gettext_noop("Owner"),
> + gettext_noop("All tables"),
> + gettext_noop("Inserts"),
> + gettext_noop("Updates"),
> + gettext_noop("Deletes"));
> +
>
> IMO this should be coded differently so that only the
> "puballsequences" column is guarded by the (pset.sversion >= 170000),
> and everything else is the same as before. This suggested way would
> also be consistent with the existing code version checks (e.g. for
> "pubtruncate" or for "pubviaroot").
>
> ~~~
>
> NITPICK - Add blank lines
> NITPICK - space in "ncols ++"
>
> ======
> src/bin/psql/tab-complete.c
>
> 14.
> Hmm. When I tried this, it didn't seem to be working properly.
>
> For example "CREATE PUBLICATION pub1 FOR ALL" only completes with
> "TABLES" but not "SEQUENCES".
> For example "CREATE PUBLICATION pub1 FOR ALL SEQ" doesn't complete
> "SEQUENCES" properly
>
> ======
> src/include/catalog/pg_publication.h
>
> NITPICK - move the extern to be adjacent to others like it.
>
> ======
> src/include/nodes/parsenodes.h
>
> 15.
> - bool for_all_tables; /* Special publication for all tables in db */
> + List *for_all_objects; /* Special publication for all objects in
> + * db */
> } CreatePublicationStmt;
>
> I felt this List logic is a bit strange. See my comment #10 in gram.y
> for more details.
>
> ~~~
>
> 16.
> - bool for_all_tables; /* Special publication for all tables in db */
> + List *for_all_objects; /* Special publication for all objects in
> + * db */
>
> Ditto comment #15 in AlterPublicationStmt
>
> ======
> src/test/regress/sql/publication.sql
>
> 17.
> +CREATE SEQUENCE testpub_seq0;
> +CREATE SEQUENCE pub_test.testpub_seq1;
> +
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION testpub_forallsequences FOR ALL SEQUENCES;
> +RESET client_min_messages;
> +
> +SELECT pubname, puballtables, puballsequences FROM pg_publication
> WHERE pubname = 'testpub_forallsequences';
> +\d+ pub_test.testpub_seq1
>
> Should you also do "\d+ tespub_seq0" here? Otherwise what was the
> point of defining the seq0 sequence being in this test?
>
> ~~~
>
> 18.
> Maybe there are missing test cases for different syntax combinations like:
>
> FOR ALL TABLES, SEQUENCES
> FOR ALL SEQUENCES, TABLES
>
> Note that the current list logic of this patch even considers my
> following bogus statement syntax is OK.
>
> test_pub=# CREATE PUBLICATION pub_silly FOR ALL TABLES, SEQUENCES,
> TABLES, TABLES, TABLES, SEQUENCES;
> CREATE PUBLICATION
> test_pub=#
>
> ======
> 99.
> Please also refer to the attached nitpicks patch which implements all
> the cosmetic issues identified above as NITPICKS.

Thank you for your feedback. I have addressed all the comments in the
attached patch.

Regards,
Vignesh

Attachment Content-Type Size
v20240703-0002-Introduce-ALL-SEQUENCES-support-for-Postgr.patch text/x-patch 95.7 KB
v20240703-0001-Introduce-pg_sequence_state-and-SetSequenc.patch text/x-patch 13.9 KB
v20240703-0003-Enhance-sequence-synchronization-during-su.patch text/x-patch 52.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-07-03 14:40:55 Re: Logical Replication of sequences
Previous Message Nathan Bossart 2024-07-03 14:29:50 Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal