Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING
Date: 2025-01-09 07:46:23
Message-ID: CAHut+Pt1ROkF8eoAHrm4C8j_c0jmYjYTgA8WLZO_WP2BbpTgYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 4, 2025 at 4:23 AM Robert Treat <rob(at)xzilla(dot)net> wrote:
>
> On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > >
> > > On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > > > - how to set the replica identity. If a table without a replica identity is
> > > > + how to set the replica identity. If a table without a replica identity
> > > > + (or with replica identity behavior the same as <literal>NOTHING</literal>) is
> > > > added to a publication that replicates <command>UPDATE</command>
> > > > or <command>DELETE</command> operations then
> > > > subsequent <command>UPDATE</command> or <command>DELETE</command>
> > >
> > > I had the impression that the root of the confusion was the perceived difference
> > > between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
> > > doesn't improve that.
> > >
> > > How about:
> > >
> > > If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
> > > or set to a primary key or index that doesn't exist) is added ...
> > >
> >
> > Is it correct to say "set to a primary key or index that doesn't
> > exist"? Because when it is set to the primary key then it should work.
> >
> > I think Peter's proposal along with Ashutosh's proposal is the simpler
> > approach to clarify things in this area but I am fine if others find
> > some other way of updating docs better.
> >
>
> After reading this thread, I think part of the confusion here is that
> technically speaking there is no such thing as having "no replica
> identity"; when a table is created, by default it's "replica identity"
> is set to DEFAULT, and how it behaves at that point will be dependent
> on the structure of the table (PK or no PK), not whether it is being
> replicated/published. To that end, I've taken the above suggestions
> and re-worked them to remove that language, as well as add some
> additional clarity.
>
> Patch attached for this, I am going to update the commitfest with
> myself as author and will work this further if needed. Thanks all.
>
> Robert Treat
> https://xzilla.net

Hi Robert.

Thanks for picking up this patch.

Some review comments for the 0001 patch.

======
doc/src/sgml/logical-replication.sgml

1.
fallback if no other solution is possible. If a replica identity other
than <literal>FULL</literal> is set on the publisher side, a
replica identity
comprising the same or fewer columns must also be set on the subscriber
- side. See <xref linkend="sql-altertable-replica-identity"/> for details on
- how to set the replica identity. If a table without a replica identity is
+ side. If a table with replica identity set to <literal>NOTHING</literal>
+ (or set to use a primary key or index that doesn't exist) is
added to a publication that replicates <command>UPDATE</command>
or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>
operations will cause an error on the publisher. <command>INSERT</command>
operations can proceed regardless of any replica identity.
+ See <xref linkend="sql-altertable-replica-identity"/> for details on
+ how to set the replica identity.

1a.
That part "If a table with replica identity set to
<literal>NOTHING</literal> (or set to use a primary key or index that
doesn't exist) is added ..." is not very clear to me.

IIUC, there are 3 ways for this to be a problem:
i) RI is set to NOTHING
ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING
iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING

To avoid any misunderstandings, why don't we just spell that out in
full? So, ...

SUGGESTION
A replica identity behaves the same as <literal>NOTHING</literal> when
it is set to <literal>DEFAULT</literal> and there is no primary key,
or when it is set <literal>USING INDEX</literal> but the index no
longer exists. If a table with replica identity set to
<literal>NOTHING</literal> (or behaving the same as
<literal>NOTHING</literal>) is added to a publication that replicates
<command>UPDATE</command> or <command>DELETE</command> operations then
subsequent <command>UPDATE</command> or <command>DELETE</command>
operations will cause an error on the publisher.
~

1b.
I've always thought that for this important topic of logical
replication it is unusual that there is not even a heading for this
topic anywhere. This makes it unnecessarily difficult to find this
information. IMO it would greatly help just to have a "Replica
Identity" subsection for all this paragraph, so then it will appear
nicely in the Chapter 29 table-of-contents making it much easier to
find.

~

1c.
That great big slab of paragraph text is hard to read. I suspect it
was done that way simply to separate the RI topic visually from the
other (paragraph) topics of the sect1 heading. But now, after we
introduce the sect2 heading (my suggestion #1b above), we don't have
to do that anymore, so more blank lines can be added and it improves
the readability a lot.

======
src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml

2.
<para>
- Records the old values of the columns of the primary key, if any.
+ Records the old values of the columns of the primary key. When there
+ is no primary key, the behavior is the same as
<literal>NOTHING</literal>.
This is the default for non-system tables.
</para>

Currently what "This" refers to seems ambiguous. It might be better to
rearrange to put that last sentence as second.

SUGGESTION
Records the old values of the columns of the primary key. This is the
default for non-system tables. When there is no primary key, the
behavior is the same as NOTHING.

======

PSA a diff patch atop yours which makes my suggested changes

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
ps_diffs_atop_roberts_patch.txt text/plain 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-01-09 07:52:57 RE: ecpg command does not warn COPY ... FROM STDIN;
Previous Message Robins Tharakan 2025-01-09 07:18:36 Re: Several buildfarm animals fail tests because of shared memory error