Re: Excessive number of replication slots for 12->14 logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Hubert Lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-09-10 06:15:29
Message-ID: CAA4eK1J8f248S7uSPsgifNedk69H-VEUfTxqoM4DRPFB7QX1eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Sep 10, 2022 at 11:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Sep 10, 2022 at 3:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Aug 30, 2022 at 3:44 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > Pushed.
> >
> > Tom reported buildfarm failures[1] and I've investigated the cause and
> > concluded this commit is relevant.
> >
>
> I was checking the failure logs at [1]. In below logs from 100_bugs_twoways.log:
>
> 2022-09-09 09:30:18.950 EDT [631b3fea.1c0:24]
> pg_16400_sync_16392_7141371862484106124 STATEMENT: START_REPLICATION
> SLOT "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/1CC9510
> (proto_version '3', origin 'any', publication_names '"testpub"')
> 2022-09-09 09:30:18.952 EDT [631b3fea.1c0:25]
> pg_16400_sync_16392_7141371862484106124 ERROR: could not find record
> while sending logically-decoded data: missing contrecord at 0/1CCF9F0
> 2022-09-09 09:30:18.952 EDT [631b3fea.1c0:26]
> pg_16400_sync_16392_7141371862484106124 STATEMENT: START_REPLICATION
> SLOT "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/1CC9510
> (proto_version '3', origin 'any', publication_names '"testpub"')
> ERROR: could not find record while sending logically-decoded data:
> missing contrecord at 0/1CCF9F0
> 2022-09-09 09:30:18.952 EDT [631b3fea.2b14:2] ERROR: error while
> shutting down streaming COPY: ERROR: could not find record while
> sending logically-decoded data: missing contrecord at 0/1CCF9F0
> 2022-09-09 09:30:18.953 EDT [631b3fea.1c0:27]
> pg_16400_sync_16392_7141371862484106124 LOG: disconnection: session
> time: 0:00:00.168 user=pgrunner database=d1 host=127.0.0.1 port=53881
> 2022-09-09 09:30:18.965 EDT [631b3fe6.137c:4] LOG: background worker
> "logical replication worker" (PID 11028) exited with exit code 1
> ....
> .....
> 2022-09-09 09:30:19.437 EDT [631b3feb.26e8:1] LOG: logical
> replication table synchronization worker for subscription "testsub",
> table "t" has started
> ....
> ....
> 2022-09-09 09:30:19.513 EDT [631b3feb.840:5]
> pg_16400_sync_16392_7141371862484106124 LOG: received replication
> command: START_REPLICATION SLOT
> "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/0 (proto_version
> '3', origin 'any', publication_names '"testpub"')
> 2022-09-09 09:30:19.513 EDT [631b3feb.840:6]
> pg_16400_sync_16392_7141371862484106124 STATEMENT: START_REPLICATION
> SLOT "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/0
> (proto_version '3', origin 'any', publication_names '"testpub"')
> 2022-09-09 09:30:19.515 EDT [631b3feb.840:7]
> pg_16400_sync_16392_7141371862484106124 LOG: starting logical
> decoding for slot "pg_16400_sync_16392_7141371862484106124"
> 2022-09-09 09:30:19.515 EDT [631b3feb.840:8]
> pg_16400_sync_16392_7141371862484106124 DETAIL: Streaming
> transactions committing after 0/1CCF9F0, reading WAL from 0/1CC94D8.
>
>
> One thing is not clear to me how the first time error: "could not find
> record while sending logically-decoded data ..." can happen due to
> this commit? Also, based on the origin even if the client sends a
> prior location (0/0 in this case) but the server will still start from
> the location where the client has confirmed the commit (aka
> confirmed_flush location).
>

I missed the point that if the 'origin_lsn' is ahead of the
'confirmed_flush' location then it will start from some prior location
which I think will be problematic.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Moisés Limón 2022-09-10 06:44:40 Bug in UPDATE statement
Previous Message Amit Kapila 2022-09-10 05:36:44 Re: Excessive number of replication slots for 12->14 logical replication