Re: Logical replication is missing block of rows when sending initial sync?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: depesz(at)depesz(dot)com, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication is missing block of rows when sending initial sync?
Date: 2023-11-03 12:20:22
Message-ID: a266b461-e340-10c6-d511-40f9edb37d28@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 11/3/23 13:04, hubert depesz lubaczewski wrote:
> On Fri, Nov 03, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
>> On Thu, Nov 2, 2023 at 4:53 PM hubert depesz lubaczewski
>> <depesz(at)depesz(dot)com> wrote:
>>>
>>> On Thu, Nov 02, 2023 at 10:17:13AM +0900, Kyotaro Horiguchi wrote:
>>>> At Mon, 30 Oct 2023 07:10:35 +0000, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
>>>>> I've tried, but I could not reproduce the failure. PSA the script what I did.
>>>>
>>>> I'm not well-versed in the details of logical replication, but does
>>>> logical replication inherently operate in such a way that it fully
>>>> maintains relationships between tables? If not, isn't it possible that
>>>> the issue in question is not about missing referenced data, but merely
>>>> a temporary delay?
>>>
>>> The problem is that date that appeared *later* was visible on the
>>> subscriber. Data that came earlier was visible too. Just some block of
>>> data got, for some reason, skipped.
>>>
>>
>> Quite strange. I think to narrow down such a problem, the first thing
>> to figure out is whether the data is skipped by initial sync or later
>> replication. To find that out, you can check remote_lsn value in
>> pg_replication_origin_status for the origin used in the initial sync
>> once the relation reaches the 'ready' state. Then, you can try to see
>> on the publisher side using pg_waldump whether the missing rows exist
>> before the value of remote_lsn or after it. That can help us to narrow
>> down the problem and could give us some clues for the next steps.
>
> I will be prepping another set of clusters to upgrade soon, will try to
> get some more data. The window to work on the bad data isn't long,
> though.
>

I think it'd be interesting to know:

1) Commit LSN for the missing rows (for the xmin).

2) Are the other changes for these transactions that *got* replicated
correctly?

3) LSNs used for the tablesync slot, catchup etc. I believe those are in
the server log.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message hubert depesz lubaczewski 2023-11-03 12:22:42 Re: Logical replication is missing block of rows when sending initial sync?
Previous Message hubert depesz lubaczewski 2023-11-03 12:04:57 Re: Logical replication is missing block of rows when sending initial sync?