From: | Fabrice Chapuis <fabrice636861(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical replication timeout problem |
Date: | 2022-01-14 10:17:07 |
Message-ID: | CAA5-nLCj4psTb=YrCyyq-h96Qg7Bjm+UNynzW_9Po7efH_x8qA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
If I can follow you, I have to make the following changes:
1. In walsender.c:
static void
WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn,
TransactionId xid)
{
static TimestampTz sendTime = 0;
TimestampTz now = GetCurrentTimestamp();
/* Keep the worker process alive */
WalSndKeepalive(true);
/*
* Track lag no more than once per WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS to
* avoid flooding the lag tracker when we commit frequently.
*/
#define WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS 1000
if (!TimestampDifferenceExceeds(sendTime, now,
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS))
return;
LagTrackerWrite(lsn, now);
sendTime = now;
}
I put *requestReply *parameter to true, is that correct?
2. In pgoutput.c
/*
* Sends the decoded DML over wire.
*
* This is called both in streaming and non-streaming modes.
*/
static void
pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
WalSndUpdateProgress(ctx, txn->origin_lsn, change->txn->xid);
if (!is_publishable_relation(relation))
return;
...
Make a call to *WalSndUpdateProgress* in function *pgoutput_change.*
For info: the information in the log after reproducing the problem.
2022-01-13 11:19:46.340 CET [82233] LOCATION: WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:19:46.340 CET [82233] LOG: 00000: attempt to send keep alive
message
2022-01-13 11:19:46.340 CET [82233] LOCATION: WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:19:46.340 CET [82233] LOG: 00000: attempt to send keep alive
message
2022-01-13 11:19:46.340 CET [82233] LOCATION: WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT: START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:20:46.418 CET [82232] ERROR: XX000: terminating logical
replication worker due to timeout
2022-01-13 11:20:46.418 CET [82232] LOCATION: LogicalRepApplyLoop,
worker.c:1267
2022-01-13 11:20:46.421 CET [82224] LOG: 00000: worker process: logical
replication worker for subscription 26994 (PID 82232) exited with exit code
1
2022-01-13 11:20:46.421 CET [82224] LOCATION: LogChildExit,
postmaster.c:3625
Thanks a lot for your help.
Fabrice
On Thu, Jan 13, 2022 at 2:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Jan 13, 2022 at 3:43 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
> >
> > first phase: postgres read WAL files and generate 1420 snap files.
> > second phase: I guess, but on this point maybe you can clarify, postgres
> has to decode the snap files and remove them if no statement must be
> applied on a replicated table.
> > It is from this point that the worker process exit after 1 minute
> timeout.
> >
>
> Okay, I think the problem could be that because we are skipping all
> the changes of transaction there is no communication sent to the
> subscriber and it eventually timed out. Actually, we try to send
> keep-alive at transaction boundaries like when we call
> pgoutput_commit_txn. The pgoutput_commit_txn will call
> OutputPluginWrite->WalSndWriteData. I think to tackle the problem we
> need to try to send such keepalives via WalSndUpdateProgress and
> invoke that in pgoutput_change when we skip sending the change.
>
> --
> With Regards,
> Amit Kapila.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2022-01-14 10:40:00 | Re: support for MERGE |
Previous Message | Julien Rouhaud | 2022-01-14 10:12:49 | Re: [PATCH] New default role allowing to change per-role/database settings |