Redux: Throttle WAL inserts before commit

From: Shirisha Shirisha <shirisha(dot)sn(at)broadcom(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: soumyadeep2007(at)gmail(dot)com, Ashwin Agrawal <ashwinstar(at)gmail(dot)com>
Subject: Redux: Throttle WAL inserts before commit
Date: 2024-08-27 10:50:40
Message-ID: CAP3-t08umaBEUEppzBVY6==3tbdLwG7b4wfrba73zfOAUrRsoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

This is an attempt to resurrect the thread [1] to throttle WAL inserts
before the point of commit.

Background:

Transactions on commit, wait for replication and make sure WAL is
flushed up to commit lsn on standby, when synchronous_commit is on.

While commit is a mandatory sync/wait point, waiting for replication at
some periodic intervals en route may be desirable/efficient to act as
good citizen. Consider for example, a setup where primary and standby
can write at 20GB/sec, while network between them can only transfer at
2GB/sec. Now if CTAS is run in such a setup for a large table, it can
generate WAL very aggressively on primary, but can't be transferred at
that rate to standby. Hence, there would be pending WAL build-up on
primary. This exhibits two main things:

- Fairness: new write transactions (even if single tuple I/U/D), and
even read transactions (setting hint bits) would exhibit latency for
amount of time equivalent to the pending WAL to be shipped and
flushed to standby.

- Primary needs to have space to hold that much WAL, since till the WAL
is not shipped to standby, it can't be recycled, if replication slots
are in use.

Proposed solution (patch attached):

- Global (backend local) variable wal_bytes_written to track the amount
of wal written by the backend since the start of transaction or the
last time SyncReplWaitForLSN() was called for this transaction.

- Whenever we find wal_bytes_written exceeds the new
wait_for_replication_threshold GUC, we set the control flag
XlogThrottlePending (similar in spirit to LogMemoryContextPending),
which is then handled at ProcessInterrupts() time. This is the
mechanism proposed in [2]. Doing it this way avoids issues such as
holding locks inside a critical section.

- To do the wait itself, we rely on SyncRepWaitForLSN(), with the cached
value of the WAL flush point.

[1] https://www.postgresql.org/message-id/flat/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20220105174643.lozdd3radxv4tlmx%40alap3.anarazel.de

Regards,
Shirisha
Broadcom Inc.

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.

Attachment Content-Type Size
v1-0001-WAL-throttling-mechanism-for-synchronous-replicat.patch application/octet-stream 12.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-08-27 11:03:00 Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Previous Message Tomas Vondra 2024-08-27 10:38:48 Re: PoC: prefetching data between executor nodes (e.g. nestloop + indexscan)