Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

From: Robert Treat <rob(at)xzilla(dot)net>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING
Date: 2025-01-10 03:10:25
Message-ID: CAJSLCQ1UU8QeUsKgK1HSLAc1wMG8C2SXnRcvsZTc3e2PQ=Wmhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 9, 2025 at 2:46 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Sat, Jan 4, 2025 at 4:23 AM Robert Treat <rob(at)xzilla(dot)net> wrote:
> >
> > On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > > >
> > > > On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > > > > - how to set the replica identity. If a table without a replica identity is
> > > > > + how to set the replica identity. If a table without a replica identity
> > > > > + (or with replica identity behavior the same as <literal>NOTHING</literal>) is
> > > > > added to a publication that replicates <command>UPDATE</command>
> > > > > or <command>DELETE</command> operations then
> > > > > subsequent <command>UPDATE</command> or <command>DELETE</command>
> > > >
> > > > I had the impression that the root of the confusion was the perceived difference
> > > > between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
> > > > doesn't improve that.
> > > >
> > > > How about:
> > > >
> > > > If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
> > > > or set to a primary key or index that doesn't exist) is added ...
> > > >
> > >
> > > Is it correct to say "set to a primary key or index that doesn't
> > > exist"? Because when it is set to the primary key then it should work.
> > >
> > > I think Peter's proposal along with Ashutosh's proposal is the simpler
> > > approach to clarify things in this area but I am fine if others find
> > > some other way of updating docs better.
> > >
> >
> > After reading this thread, I think part of the confusion here is that
> > technically speaking there is no such thing as having "no replica
> > identity"; when a table is created, by default it's "replica identity"
> > is set to DEFAULT, and how it behaves at that point will be dependent
> > on the structure of the table (PK or no PK), not whether it is being
> > replicated/published. To that end, I've taken the above suggestions
> > and re-worked them to remove that language, as well as add some
> > additional clarity.
> >
> > Patch attached for this, I am going to update the commitfest with
> > myself as author and will work this further if needed. Thanks all.
> >
> > Robert Treat
> > https://xzilla.net
>
> Hi Robert.
>
> Thanks for picking up this patch.
>
> Some review comments for the 0001 patch.
>
> ======
> doc/src/sgml/logical-replication.sgml
>
> 1.
> fallback if no other solution is possible. If a replica identity other
> than <literal>FULL</literal> is set on the publisher side, a
> replica identity
> comprising the same or fewer columns must also be set on the subscriber
> - side. See <xref linkend="sql-altertable-replica-identity"/> for details on
> - how to set the replica identity. If a table without a replica identity is
> + side. If a table with replica identity set to <literal>NOTHING</literal>
> + (or set to use a primary key or index that doesn't exist) is
> added to a publication that replicates <command>UPDATE</command>
> or <command>DELETE</command> operations then
> subsequent <command>UPDATE</command> or <command>DELETE</command>
> operations will cause an error on the publisher. <command>INSERT</command>
> operations can proceed regardless of any replica identity.
> + See <xref linkend="sql-altertable-replica-identity"/> for details on
> + how to set the replica identity.
>
> 1a.
> That part "If a table with replica identity set to
> <literal>NOTHING</literal> (or set to use a primary key or index that
> doesn't exist) is added ..." is not very clear to me.
>
> IIUC, there are 3 ways for this to be a problem:
> i) RI is set to NOTHING
> ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING
> iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING
>
> To avoid any misunderstandings, why don't we just spell that out in
> full? So, ...
>
> SUGGESTION
> A replica identity behaves the same as <literal>NOTHING</literal> when
> it is set to <literal>DEFAULT</literal> and there is no primary key,
> or when it is set <literal>USING INDEX</literal> but the index no
> longer exists. If a table with replica identity set to
> <literal>NOTHING</literal> (or behaving the same as
> <literal>NOTHING</literal>) is added to a publication that replicates
> <command>UPDATE</command> or <command>DELETE</command> operations then
> subsequent <command>UPDATE</command> or <command>DELETE</command>
> operations will cause an error on the publisher.
> ~
>

This felt a little wordy, but incorporated it into an updated version.

> 1b.
> I've always thought that for this important topic of logical
> replication it is unusual that there is not even a heading for this
> topic anywhere. This makes it unnecessarily difficult to find this
> information. IMO it would greatly help just to have a "Replica
> Identity" subsection for all this paragraph, so then it will appear
> nicely in the Chapter 29 table-of-contents making it much easier to
> find.
>
> ~
>
> 1c.
> That great big slab of paragraph text is hard to read. I suspect it
> was done that way simply to separate the RI topic visually from the
> other (paragraph) topics of the sect1 heading. But now, after we
> introduce the sect2 heading (my suggestion #1b above), we don't have
> to do that anymore, so more blank lines can be added and it improves
> the readability a lot.
>

Agreed.

> ======
> src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
>
> 2.
> <para>
> - Records the old values of the columns of the primary key, if any.
> + Records the old values of the columns of the primary key. When there
> + is no primary key, the behavior is the same as
> <literal>NOTHING</literal>.
> This is the default for non-system tables.
> </para>
>
> Currently what "This" refers to seems ambiguous. It might be better to
> rearrange to put that last sentence as second.
>
> SUGGESTION
> Records the old values of the columns of the primary key. This is the
> default for non-system tables. When there is no primary key, the
> behavior is the same as NOTHING.
>
> ======
>

+1

> PSA a diff patch atop yours which makes my suggested changes
>

Updated patch incorporating your latest changes attached.

Robert Treat
https://xzilla.net

Attachment Content-Type Size
v2-0001-Expand-and-clarify-Replica-Identity-information.patch application/octet-stream 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-01-10 03:11:19 Re: CREATE SUBSCRIPTION - add missing test case
Previous Message Peter Smith 2025-01-10 02:50:55 Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size