Re: Logical replication is missing block of rows when sending initial sync?

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication is missing block of rows when sending initial sync?
Date: 2023-11-08 12:20:36
Message-ID: ZUt9FOueePOl2ZUp@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 03, 2023 at 01:22:42PM +0100, hubert depesz lubaczewski wrote:
> On Fri, Nov 03, 2023 at 01:20:22PM +0100, Tomas Vondra wrote:
> > I think it'd be interesting to know:
> >
> > 1) Commit LSN for the missing rows (for the xmin).
> >
> > 2) Are the other changes for these transactions that *got* replicated
> > correctly?
>
> How would I know?
>
> > 3) LSNs used for the tablesync slot, catchup etc. I believe those are in
> > the server log.
>
> I will provide more logs on the next problem.

OK. found another case (maybe two).

First, error:
ALTER TABLE ONLY c203s15531.conversation_message_participants ADD CONSTRAINT fk_rails_992a8b9e13 FOREIGN KEY (conversation_message_id) REFERENCES c203s15531.conversation_messages(id);
ERROR: insert or update on table "conversation_message_participants" violates foreign key constraint "fk_rails_992a8b9e13"
DETAIL: Key (conversation_message_id)=(431568) is not present in table "conversation_messages".

With forced seq scans (set enable_indexscan = false; set enable_bitmapscan = false; set enable_indexonlyscan = false;)

I got:

(12/bionic):

#v+
=# select count(*) from c203s15531.conversation_messages where id = 431568;
count
───────
1
(1 row)

=# select count(*) from c203s15531.conversation_message_participants where conversation_message_id=431568;
count
───────
2
(1 row)
#v-

(14/focal):

#v+
=# select count(*) from c203s15531.conversation_messages where id = 431568;
count
───────
0
(1 row)

=# select count(*) from c203s15531.conversation_message_participants where conversation_message_id=431568;
count
───────
2
(1 row)
#v-

So it looks that some rows are missing. Let's see the missing range:

#v+
Pg14# select max(id) from conversation_messages where id < 431568;
max
────────
431567
(1 row)

Pg14# select min(id) from conversation_messages where id > 431568;
min
────────
431572
(1 row)
#v-

So, the block from 431568 to 431571 is missing. On pg12, though:

#v+
Pg12# select count(*) from conversation_messages where id between 431568 and 431571;
count
───────
4
(1 row)
Pg12# select ctid, xmin, cmin, xmax, cmax, id, created_at from conversation_messages where id between 431568 and 431571;
ctid │ xmin │ cmin │ xmax │ cmax │ id │ created_at
────────────┼────────────┼──────┼────────────┼──────┼────────┼────────────────────────────
(29604,10) │ 3095841913 │ 1 │ 3095841995 │ 1 │ 431568 │ 2023-11-07 15:12:38.712504
(29604,11) │ 3095848690 │ 2 │ 3095848745 │ 2 │ 431569 │ 2023-11-07 15:12:49.5695
(29604,12) │ 3095884949 │ 2 │ 3095885033 │ 2 │ 431570 │ 2023-11-07 15:13:37.511025
(29604,13) │ 3095917833 │ 2 │ 3095917890 │ 2 │ 431571 │ 2023-11-07 15:14:40.058367
(4 rows)
#v-

In logs on bionic/pg12 I found when this particular table was added to subscription:

#v+
2023-11-07 15:10:19.023 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,29,"idle",2023-11-07 15:10:18 UTC,30/865502097,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15745.a_g_c_r_2023_10;",,,,,,,,,"psql"
2023-11-07 15:10:19.026 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,30,"idle",2023-11-07 15:10:18 UTC,30/865502098,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15311.q_q;",,,,,,,,,"psql"
2023-11-07 15:10:19.032 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,31,"idle",2023-11-07 15:10:18 UTC,30/865502099,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s16352.m_2023_06;",,,,,,,,,"psql"
2023-11-07 15:10:19.035 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,32,"idle",2023-11-07 15:10:18 UTC,30/865502100,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15531.conversation_messages;",,,,,,,,,"psql"
2023-11-07 15:10:19.039 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,33,"idle",2023-11-07 15:10:18 UTC,30/865502101,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15479.q_s_e_2023_10;",,,,,,,,,"psql"
2023-11-07 15:10:19.043 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,34,"idle",2023-11-07 15:10:18 UTC,30/865502102,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s17062.wiki_pages;",,,,,,,,,"psql"
2023-11-07 15:10:19.046 UTC,"postgres","dbname",7350,"[local]",654a535a.1cb6,35,"idle",2023-11-07 15:10:18 UTC,30/865502103,0,LOG,00000,"statement: ALTER PUBLICATION focal14 ADD TABLE ONLY c203s15841.a_g_c_r_2023_10;",,,,,,,,,"psql"
#v-

Since it was previously asked - schema:

#v+
Column │ Type │ Collation │ Nullable │ Default
───────────────────────┼─────────────────────────────┼───────────┼──────────┼───────────────────────────────────────────────────
id │ bigint │ │ not null │ nextval('conversation_messages_id_seq'::regclass)
..................... │ bigint │ │ │
..................... │ bigint │ │ │
created_at │ timestamp without time zone │ │ │
..................... │ boolean │ │ │
..................... │ text │ │ │
..................... │ text │ │ │
..................... │ character varying(255) │ │ │
..................... │ character varying(255) │ │ │
..................... │ bigint │ │ │
..................... │ character varying(255) │ │ │
..................... │ bigint │ │ │
..................... │ character varying(255) │ │ │
..................... │ text │ │ │
..................... │ boolean │ │ │
..................... │ boolean │ │ │
..................... │ text │ │ │
Indexes:
"conversation_messages_pkey" PRIMARY KEY, btree (id), tablespace "data1"
"index_conversation_messages_on_...." btree (.........), tablespace "data1"
"index_conversation_messages_on_......_and_created_at" btree (..........., created_at), tablespace "data1"
Foreign-key constraints:
"fk_rails_c3c322d882" FOREIGN KEY (conversation_id) REFERENCES conversations(id)
Referenced by:
TABLE "conversation_message_participants" CONSTRAINT "fk_rails_992a8b9e13" FOREIGN KEY (conversation_message_id) REFERENCES conversation_messages(id)
TABLE "conversation_batches" CONSTRAINT "fk_rails_d421fc74f4" FOREIGN KEY (root_conversation_message_id) REFERENCES conversation_messages(id)
Publications:
"cdc"
"focal14"
Triggers:
guard_excessive_deletes AFTER DELETE ON conversation_messages REFERENCING OLD TABLE AS oldtbl FOR EACH STATEMENT EXECUTE FUNCTION guard_excessive_updates()
guard_excessive_updates AFTER UPDATE ON conversation_messages REFERENCING OLD TABLE AS oldtbl FOR EACH STATEMENT EXECUTE FUNCTION guard_excessive_updates()
#v-

There are no dropped attributes.

Stats:

#v+
Pg12# select * from pg_stat_user_tables where relid = 'c203s15531.conversation_messages'::regclass \gx
─[ RECORD 1 ]───────┬──────────────────────────────
relid │ 166420
schemaname │ c203s15531
relname │ conversation_messages
seq_scan │ 14
seq_tup_read │ 2595888
idx_scan │ 16059568
idx_tup_fetch │ 27426090
n_tup_ins │ 372696
n_tup_upd │ 38
n_tup_del │ 0
n_tup_hot_upd │ 0
n_live_tup │ 432630
n_dead_tup │ 0
n_mod_since_analyze │ 7034
last_vacuum │ [null]
last_autovacuum │ 2023-08-10 18:06:11.655573+00
last_analyze │ [null]
last_autoanalyze │ 2023-11-01 18:56:52.868395+00
vacuum_count │ 0
autovacuum_count │ 5
analyze_count │ 0
autoanalyze_count │ 91
#v-

I can leave it as is for, let's say, a day. So if there are any queries/data
you'd like me to find out, let me know, but afterwards I will have to rebuild
the focal from scratch.

For whatever it's worth, we're adding tables to publication using:

#v+
with base as (
select n.nspname, c.relname, c.relpages
from pg_class c join pg_namespace n on c.relnamespace = n.oid
left outer join pg_catalog.pg_publication_tables pt on pt.pubname = 'focal14' and pt.schemaname = n.nspname and pt.tablename = c.relname
where c.relkind in ('p', 'r') and n.nspname ~ 'schemas|we|care|about' and pt.pubname is null
)
select format('ALTER PUBLICATION focal14 ADD TABLE ONLY %I.%I;', nspname, relname)
from base
order by relpages desc
limit 40 \gexec
#v-

Best regards,

depesz

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Gustafsson 2023-11-08 12:27:43 Re: Windows compatibility
Previous Message Manika Singhal 2023-11-08 07:02:23 Re: BUG #18185: Error when calling whoami at the beginning of the installation