Re: Syncrep and improving latency due to WAL throttling

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-27 11:06:49
Message-ID: CAKZiRmzyFq9Tp5=Nzcj=j2qQSbxC-f0ENef3bJovYKWFw25_3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

v2 is attached.

On Thu, Jan 26, 2023 at 4:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Huh? Why did you remove the GUC?

After reading previous threads, my optimism level of getting it ever
in shape of being widely accepted degraded significantly (mainly due
to the discussion of wider category of 'WAL I/O throttling' especially
in async case, RPO targets in async case and potentially calculating
global bandwidth). I've assumed that it is a working sketch, and as
such not having GUC name right now (just for sync case) would still
allow covering various other async cases in future without breaking
compatibility with potential name GUC changes (see my previous
"wal_throttle_larger_transactions=<strategies>" proposal ). Although
I've restored the synchronous_commit_flush_wal_after GUC into v2
patch, sticking to such a name removes the way of using the code to
achieve async WAL throttling in future.

> Clearly this isn't something we can default to on..

Yes, I agree. It's never meant to be on by default.

> I don't like the resets around like this. Why not just move it into
> XLogFlush()?

Fixed.

> Hm - wonder if it'd be better to determine the LSN to wait for in
> XLogInsertRecord(). There'll be plenty cases where we'll insert multiple (but
> bounded number of) WAL records before processing interrupts. No need to flush
> more aggressively than necessary...

Fixed.

> SyncRepWaitForLSN() has this comment:
> * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN
> * represents a commit record. If it doesn't, then we wait only for the WAL
> * to be flushed if synchronous_commit is set to the higher level of
> * remote_apply, because only commit records provide apply feedback.

Hm, not sure if I understand: are you saying that we should (in the
throttled scenario) have some special feedback msgs or not --
irrespective of the setting? To be honest the throttling shouldn't
wait for the standby full setting, it's just about slowdown fact (so
IMHO it would be fine even in remote_write/remote_apply scenario if
the remote walreceiver just received the data, not necessarily write
it into file or wait for for applying it). Just this waiting for a
round-trip ack about LSN progress would be enough to slow down the
writer (?). I've added some timing log into the draft and it shows
more or less constantly solid RTT even as it stands:

psql:insertBIG.sql:2: WARNING: throttling WAL down on this session
(backendWalInserted=262632, LSN=2/A6CB70B8 flushingTo=2/A6CB6000)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session -
end (10.500052 ms)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session
(backendWalInserted=262632, LSN=2/A6CF7C08 flushingTo=2/A6CF6000)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session -
end (10.655370 ms)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session
(backendWalInserted=262632, LSN=2/A6D385E0 flushingTo=2/A6D38000)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session -
end (10.627334 ms)
psql:insertBIG.sql:2: WARNING: throttling WAL down on this session
(backendWalInserted=262632, LSN=2/A6D78FA0 flushingTo=2/A6D78000)
[..]

> I think we'd want a distinct wait event for this.

Added and tested that it shows up.

> > +volatile sig_atomic_t XLogDelayPending = false;
> I don't think the new variable needs to be volatile, or even
> sig_atomic_t. We'll just manipulate it from outside signal handlers.

Changed to bool, previously I wanted it to "fit" the above ones.

-J.

Attachment Content-Type Size
v2-0001-WIP-Syncrep-and-improving-latency-due-to-WAL-thro.patch application/octet-stream 11.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2023-01-27 11:28:59 Re: Syncrep and improving latency due to WAL throttling
Previous Message Bharath Rupireddy 2023-01-27 11:04:39 Re: Syncrep and improving latency due to WAL throttling