Walsender waiting on SnapbuildSync

From: Brent Kerby <blkerby(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Walsender waiting on SnapbuildSync
Date: 2018-08-04 19:34:04
Message-ID: CAH8WVsjqRzVNSAaM68PMWt2s+4gcntAh7JpiSwFhAHY=WSRc3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Postgres 10.3 (on AWS RDS), I am running logical decoding using the
test_decoding output plugin, and every few minutes I am seeing pauses in
the stream, unrelated to any large transactions. About once every hour or
two, the pause is long enough that the database disconnects my client due
to exceeding wal_sender_timeout (30 seconds -- the RDS default value);
after reconnecting it is able to make progress again. My client is using
the streaming replication protocol via pgjdbc (with a status interval of 1
second). What I'm seeing is that during such a pause, the server is not
sending any data to the client:

- pg_stat_replication.sent_lsn stops advancing
- My client is blocking in a call to PGReplicationStream.read()
- pg_stat_activity shows that the walsender process has a wait_event of
'SnapbuildSync'.

In this scenario, it makes sense that the client would be timed out: pgjdbc
only sends feedback to the server at the beginning of a call to
PGReplicationStream.read(), so if a single call blocks a long time, never
receiving any data from the server, then the client would stop sending
feedback to the server, causing timeout.

My question is why might the server be spending so much time waiting on
SnapbuildSync? The docs describe this event as follows:

"IO / SnapbuildSync / Waiting for a serialized historical catalog snapshot
to reach stable storage."

I gather that this is related to statistics collection, but I'm not
understanding why a walsender process would wait on such an event nor why
it would take such a long time. Any ideas?

Another thing is that when these pauses occur they are always in between
transactions, i.e., after the client has received a COMMIT message but
before receiving the next BEGIN. And the transactions before and after are
generally normally-sized ones (at most a few kilobytes of WAL), so this
doesn't appear to be related to issues with large transactions that have
been discussed in the past.

- Brent

(Originally posted here:
https://stackoverflow.com/questions/51687322/postgres-walsender-waiting-on-snapbuildsync
)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2018-08-04 20:50:26 Re: Pg_rewind cannot load history wal
Previous Message Dmitry Igrishin 2018-08-04 17:45:27 Re: Add column with If Not Exists