From: | hubert depesz lubaczewski <depesz(at)depesz(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Logical replication is missing block of rows when sending initial sync? |
Date: | 2023-11-10 11:44:17 |
Message-ID: | ZU4XkdOPCthiyN7L@depesz.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Nov 09, 2023 at 08:14:46PM +0100, Tomas Vondra wrote:
>
>
> On 11/8/23 15:02, hubert depesz lubaczewski wrote:
> >> OK. found another case (maybe two).
> >
> > OK, the other case is real. Different database. Same scenario.
> >
> > Errors (3 because we have 3 missing fkeys):
> >
> > #v+
> > ALTER TABLE ONLY c44s16539.assignment_overrides ADD CONSTRAINT fk_rails_58f8ee369b FOREIGN KEY (assignment_id) REFERENCES c44s16539.assignments(id);
> > ERROR: insert or update on table "assignment_overrides" violates foreign key constraint "fk_rails_58f8ee369b"
> > DETAIL: Key (assignment_id)=(248083) is not present in table "assignments".
> >
> > ALTER TABLE ONLY c44s16539.post_policies ADD CONSTRAINT fk_rails_cf2d119863 FOREIGN KEY (assignment_id) REFERENCES c44s16539.assignments(id);
> > ERROR: insert or update on table "post_policies" violates foreign key constraint "fk_rails_cf2d119863"
> > DETAIL: Key (assignment_id)=(248083) is not present in table "assignments".
> >
> > ALTER TABLE ONLY c44s16539.submissions ADD CONSTRAINT fk_rails_61cac0823d FOREIGN KEY (assignment_id) REFERENCES c44s16539.assignments(id);
> > ERROR: insert or update on table "submissions" violates foreign key constraint "fk_rails_61cac0823d"
> > DETAIL: Key (assignment_id)=(248083) is not present in table "assignments".
> > #v-
> >
> > In all cases the problem is missing c44s16539.assignments with is = 248083.
> >
> > This time it looks that only one row is missing:
> >
> > #v+
> > Pg14# select max(id) from c44s16539.assignments where id < 248083;
> > max
> > ────────
> > 248082
> > (1 row)
> >
> > Pg14# select min(id) from c44s16539.assignments where id > 248083;
> > min
> > ────────
> > 248084
> > (1 row)
> > #v-
> >
> > Schema is complicated. 76 columns. On pg12 side one column is dropped.
> > 22 indexes, 9 fkeys from this table (assignments) to others, ~ 35 fkeys
> > pointing to this table (but only 3 have rows that reference the
> > missing record.
> >
> > Some informationa bout the record:
> >
> > #v+
> > Pg12# select ctid, xmin, cmin, xmax, cmax, id, created_at, updated_at from c44s16539.assignments where id = 248083 \gx
> > ─[ RECORD 1 ]──────────────────────────
> > ctid │ (24192,1)
> > xmin │ 472291499
> > cmin │ 12
> > xmax │ 472291499
> > cmax │ 12
> > id │ 248083
> > created_at │ 2023-11-07 15:50:05.506077
> > updated_at │ 2023-11-07 15:51:28.738893
> > #v-
> >
> > logs for when it was added to replication:
> >
> > #v+
> > 2023-11-07 15:47:27.898 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,39,"ALTER PUBLICATION",2023-11-07 15:47:27 UTC,6/0,0,LOG,00000,"duration: 1.476 ms",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.899 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,40,"idle",2023-11-07 15:47:27 UTC,6/1209097693,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c247s1094.messages_2022_50;",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.900 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,41,"ALTER PUBLICATION",2023-11-07 15:47:27 UTC,6/0,0,LOG,00000,"duration: 1.493 ms",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.900 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,42,"idle",2023-11-07 15:47:27 UTC,6/1209097694,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c44s16539.assignments;",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.902 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,43,"ALTER PUBLICATION",2023-11-07 15:47:27 UTC,6/0,0,LOG,00000,"duration: 1.936 ms",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.902 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,44,"idle",2023-11-07 15:47:27 UTC,6/1209097695,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c39s5131.messages_2023_20;",,,,,,,,,"psql"
> > 2023-11-07 15:47:27.903 UTC,"postgres","dbname",2333,"[local]",654a5c0f.91d,45,"ALTER PUBLICATION",2023-11-07 15:47:27 UTC,6/0,0,LOG,00000,"duration: 1.292 ms",,,,,,,,,"psql"
> > #v-
> >
> > It is interesting that the time when the row was inserted (created_at) is
> > always "around" the time that the table was added to publication...
> >
> > Just like with the other case, I can leave it in current state for
> > a day, so if you'd need more information from it, let me know.
> >
>
> I think it'd be interesting to look at the WAL using pg_waldump, and see
> how it relates to the LSN used for the tablesync slot.
well, I can try to dump, and show data, but it will most likely be
a lot. Are you more interested in looking into wal around the second
where data was inserted, or when table was added to publication?
> I'll try reproducing this locally over the weekend. Should I use the
> test_1030.sh script that you shared a week ago, or do I need to do
> something more?
That question is probably not to me, but to Hayato Kuroda
> One thing that puzzles me is that IIUC the script doesn't generate any
> changes while adding the tables to the subscription? I probably should
> have paid more attention to the details, instead of assuming there are
> concurrent changes - which seemed like there might be some race, or
> something. But this seems like there are no changes and the tablesync
> just fails to copy some subset of it. That's *really* strange.
This script was made by Hayato Kuroda, as far as I can tell. Our
situation is not like this.
In our case there is constant traffic on the publisher from normally
working app.
Best regards,
depesz
From | Date | Subject | |
---|---|---|---|
Next Message | hubert depesz lubaczewski | 2023-11-10 13:08:10 | Re: Logical replication is missing block of rows when sending initial sync? |
Previous Message | PG Bug reporting form | 2023-11-10 11:07:20 | BUG #18189: Value partitioned tables: Upd ERROR: more than one row returned by a subquery used as an expression |