From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | Re: Syncrep and improving latency due to WAL throttling |
Date: | 2023-01-27 21:33:59 |
Message-ID: | 20230127213359.n6rrcgtmogedir7c@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-01-27 21:45:16 +0100, Tomas Vondra wrote:
> On 1/27/23 08:18, Bharath Rupireddy wrote:
> >> I think my idea of only forcing to flush/wait an LSN some distance in the past
> >> would automatically achieve that?
> >
> > I'm sorry, I couldn't get your point, can you please explain it a bit more?
> >
>
> The idea is that we would not flush the exact current LSN, because
> that's likely somewhere in the page, and we always write the whole page
> which leads to write amplification.
>
> But if we backed off a bit, and wrote e.g. to the last page boundary,
> that wouldn't have this issue (either the page was already flushed -
> noop, or we'd have to flush it anyway).
Yep.
> We could even back off a bit more, to increase the probability it was
> actually flushed / sent to standby.
That's not the sole goal, from my end: I'd like to avoid writing out +
flushing the WAL in too small chunks. Imagine a few concurrent vacuums or
COPYs or such - if we're unlucky they'd each end up exceeding their "private"
limit close to each other, leading to a number of small writes of the
WAL. Which could end up increasing local commit latency / iops.
If we instead decide to only ever flush up to something like
last_page_boundary - 1/8 * throttle_pages * XLOG_BLCKSZ
we'd make sure that the throttling mechanism won't cause a lot of small
writes.
> > Keeping replication lag under check enables one to provide a better
> > RPO guarantee as discussed in the other thread
> > https://www.postgresql.org/message-id/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com.
> >
>
> Isn't that a bit over-complicated? RPO generally only cares about xacts
> that committed (because that's what you want to not lose), so why not to
> simply introduce a "sync mode" that simply uses a bit older LSN when
> waiting for the replica? Seems much simpler and similar to what we
> already do.
I don't think that really helps you that much. If there's e.g. a huge VACUUM /
COPY emitting loads of WAL you'll suddenly see commit latency of a
concurrently committing transactions spike into oblivion. Whereas a general
WAL throttling mechanism would throttle the VACUUM, without impacting the
commit latency of normal transactions.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-01-27 21:35:11 | Re: Non-superuser subscription owners |
Previous Message | Andres Freund | 2023-01-27 21:20:39 | Re: Syncrep and improving latency due to WAL throttling |