From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | depesz(at)depesz(dot)com, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Logical replication is missing block of rows when sending initial sync? |
Date: | 2023-11-03 12:20:22 |
Message-ID: | a266b461-e340-10c6-d511-40f9edb37d28@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 11/3/23 13:04, hubert depesz lubaczewski wrote:
> On Fri, Nov 03, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
>> On Thu, Nov 2, 2023 at 4:53 PM hubert depesz lubaczewski
>> <depesz(at)depesz(dot)com> wrote:
>>>
>>> On Thu, Nov 02, 2023 at 10:17:13AM +0900, Kyotaro Horiguchi wrote:
>>>> At Mon, 30 Oct 2023 07:10:35 +0000, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
>>>>> I've tried, but I could not reproduce the failure. PSA the script what I did.
>>>>
>>>> I'm not well-versed in the details of logical replication, but does
>>>> logical replication inherently operate in such a way that it fully
>>>> maintains relationships between tables? If not, isn't it possible that
>>>> the issue in question is not about missing referenced data, but merely
>>>> a temporary delay?
>>>
>>> The problem is that date that appeared *later* was visible on the
>>> subscriber. Data that came earlier was visible too. Just some block of
>>> data got, for some reason, skipped.
>>>
>>
>> Quite strange. I think to narrow down such a problem, the first thing
>> to figure out is whether the data is skipped by initial sync or later
>> replication. To find that out, you can check remote_lsn value in
>> pg_replication_origin_status for the origin used in the initial sync
>> once the relation reaches the 'ready' state. Then, you can try to see
>> on the publisher side using pg_waldump whether the missing rows exist
>> before the value of remote_lsn or after it. That can help us to narrow
>> down the problem and could give us some clues for the next steps.
>
> I will be prepping another set of clusters to upgrade soon, will try to
> get some more data. The window to work on the bad data isn't long,
> though.
>
I think it'd be interesting to know:
1) Commit LSN for the missing rows (for the xmin).
2) Are the other changes for these transactions that *got* replicated
correctly?
3) LSNs used for the tablesync slot, catchup etc. I believe those are in
the server log.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | hubert depesz lubaczewski | 2023-11-03 12:22:42 | Re: Logical replication is missing block of rows when sending initial sync? |
Previous Message | hubert depesz lubaczewski | 2023-11-03 12:04:57 | Re: Logical replication is missing block of rows when sending initial sync? |