Re: CREATE REPLICATION SLOT fails on a timeout

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE REPLICATION SLOT fails on a timeout
Date: 2014-05-16 20:43:31
Message-ID: 20140516204331.GE13967@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-05-16 16:37:16 -0400, Steve Singer wrote:
> I am finding that my logical walsender connections are being terminated due
> to a timeout on the CREATE REPLICATION SLOT command. with "terminating
> walsender process due to replication timeout"
>
> Below is the stack trace when this happens
>
> #3 0x000000000067df28 in WalSndCheckTimeOut (now=now(at)entry=453585463823871)
> at walsender.c:1748
> #4 0x000000000067eedc in WalSndWaitForWal (loc=691727888) at
> walsender.c:1216
> ...
> #9 0x0000000000680f16 in CreateReplicationSlot (cmd=0x1798b50) at
> walsender.c:800
> #10 exec_replication_command () at walsender.c:1291
> #11 0x00000000006bf4a1 in PostgresMain (argc=<optimized out>,
> argv=argv(at)entry=0x177db50, dbname=0x177db30 "test1",
>
> (gdb) p last_reply_timestamp
> $1 = 0
>
>
> I propose the attached patch sets last_reply_timestamp to now() it starts
> processing a command. Since receiving a command is hearing something from
> the client.

Hm. Yes, that's a problem.

> diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> new file mode 100644
> index 5c11d68..56a2f10
> *** a/src/backend/replication/walsender.c
> --- b/src/backend/replication/walsender.c
> *************** exec_replication_command(const char *cmd
> *** 1276,1281 ****
> --- 1276,1282 ----
> parse_rc))));
>
> cmd_node = replication_parse_result;
> + last_reply_timestamp = GetCurrentTimestamp();
>
> switch (cmd_node->type)
> {

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2014-05-16 21:02:33 Re: CREATE REPLICATION SLOT fails on a timeout
Previous Message Steve Singer 2014-05-16 20:37:16 CREATE REPLICATION SLOT fails on a timeout