From: | Fabrice Chapuis <fabrice636861(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical replication timeout problem |
Date: | 2021-12-29 11:32:41 |
Message-ID: | CAA5-nLB2K0YTaYtd-qUwy+QTUA0Q0fZt9_R=mFmb9pOVObgBBA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I put the instance with high level debug mode.
I try to do some log interpretation: After having finished writing the
modifications generated by the insert in the snap files,
then these files are read (restored). One minute after this work starts,
the worker process exit with an error code = 1.
I see that keepalive messages were sent before the work process work leave.
2021-12-28 10:50:01.894 CET [55792] LOCATION: WalSndKeepalive,
walsender.c:3365
...
2021-12-28 10:50:31.854 CET [55792] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG: 00000: StartTransaction(1)
name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
2021-12-28 10:50:31.907 CET [55792] LOCATION: ShowTransactionStateRec,
xact.c:5075
2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG: 00000: spill 2271 changes in
XID 14312 to disk
2021-12-28 10:50:31.907 CET [55792] LOCATION: ReorderBufferSerializeTXN,
reorderbuffer.c:2245
2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
*2021-12-28 10:50:32.110 CET [55792] DEBUG: 00000: restored 4096/22603999
changes from disk*
2021-12-28 10:50:32.110 CET [55792] LOCATION: ReorderBufferIterTXNNext,
reorderbuffer.c:1156
2021-12-28 10:50:32.110 CET [55792] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:32.138 CET [55792] DEBUG: 00000: restored 4096/22603999
changes from disk
...
*2021-12-28 10:50:35.341 CET [55794] DEBUG: 00000: sending replication
keepalive2021-12-28 10:50:35.341 CET [55794] LOCATION: WalSndKeepalive,
walsender.c:3365*
...
*2021-12-28 10:51:31.995 CET [55791] ERROR: XX000: terminating logical
replication worker due to timeout*
*2021-12-28 10:51:31.995 CET [55791] LOCATION: LogicalRepApplyLoop,
worker.c:1267*
Could this function in* Apply main loop* in worker.c help to find a
solution?
rc = WaitLatchOrSocket(MyLatch,
WL_SOCKET_READABLE | WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
fd, wait_time,
WAIT_EVENT_LOGICAL_APPLY_MAIN);
Thanks for your help
Fabrice
On Thu, Dec 23, 2021 at 11:52 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
> >
> > Hello Amit,
> >
> > I was able to reproduce the timeout problem in the lab.
> > After loading more than 20 millions of rows in a table which is not
> replicated (insert command ends without error), errors related to logical
> replication processes appear in the postgres log.
> > Approximately every 5 minutes worker process is restarted. The snap
> files in the slot directory are still present. The replication system seems
> to be blocked. Why these snap files are not removed. What do they contain?
> >
>
> These contain changes of insert. I think these are not removed for
> your case as your long transaction is never finished. As mentioned
> earlier, for such cases, it is better to use 'streaming' feature
> released as part of PG-14 but anyway here we are trying to debug
> timeout problem.
>
> > I will recompile postgres with your patch to debug.
> >
>
> Okay, that might help.
>
> --
> With Regards,
> Amit Kapila.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Fabrízio de Royes Mello | 2021-12-29 11:50:23 | Re: Converting WAL to SQL |
Previous Message | Peter Eisentraut | 2021-12-29 11:08:17 | Re: automatically generating node support functions |