From: | Will Mortensen <will(at)extrahop(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Marco Slot <marco(dot)slot(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jacob Speidel <jacob(at)extrahop(dot)com>, Yvonne Chen <yvonne(at)extrahop(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Exposing the lock manager's WaitForLockers() to SQL |
Date: | 2024-05-30 07:01:32 |
Message-ID: | CAMpnoC4AZ963rzce_HOi2drB9TkB6SPhCUtspEPJUzQiGJ5KNA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I got some very helpful off-list feedback from Robert Haas that this
needed more self-contained explanation/motivation. So here goes. :-)
This patch set adds a new SQL function pg_wait_for_lockers(), which
waits for transactions holding specified table locks to commit or roll
back. This can be useful with knowledge of the queries in those
transactions, particularly for asynchronous and incremental processing
of inserted/updated rows.
Specifically, consider a scenario where INSERTs and UPDATEs always set
a serial column to its default value. A client can call
pg_sequence_last_value() + pg_wait_for_lockers() and then take a new
DB snapshot and know that rows committed after this snapshot will have
values of the serial column greater than the value from
pg_sequence_last_value(). As shown in the example at the end, this
allows the client to asynchronously and incrementally read
inserted/updated rows with minimal per-client state, without buffering
changes, and without affecting writer transactions.
There are lots of other ways to support incrementally reading new
rows, but they don’t have all of those qualities. For example:
* Forcing writers to commit in a specific order (e.g. by serial column
value) would reduce throughput
* Explicitly tracking or coordinating with writers would likely be
more complex, impact performance, and/or require much more state
* Methods that are synchronous or buffer/queue changes are problematic
if readers fall behind
Existing ways to wait for table locks also have downsides:
* Taking a conflicting lock with LOCK blocks new transactions from
taking the lock of interest while LOCK waits. And in order to wait for
writers holding RowExclusiveLock, we must take ShareLock, which also
conflicts with ShareUpdateExclusiveLock and therefore unnecessarily
interferes with (auto)vacuum. Finally, with multiple tables LOCK locks
them one at a time, so it waits (and holds locks) longer than
necessary.
* Using pg_locks / pg_lock_status() to identify the transactions
holding the locks is more expensive since it also returns all other
locks in the DB cluster, plus there’s no efficient built-in way to
wait for the transactions to commit or roll back.
By contrast, pg_wait_for_lockers() doesn’t block other transactions,
waits on multiple tables in parallel, and doesn’t spend time looking
at irrelevant locks.
This change is split into three patches for ease of review. The first
two patches modify the existing WaitForLockers() C function and other
locking internals to support waiting for lockers in a single lock
mode, which allows waiting for INSERT/UPDATE without waiting for
vacuuming. These changes could be omitted at the cost of unnecessary
waiting, potentially for a long time with slow vacuums. The third
patch adds the pg_wait_for_lockers() SQL function, which just calls
WaitForLockers().
FWIW, another solution might be to directly expose the functions that
WaitForLockers() calls, namely GetLockConflicts() (generalized to
GetLockers() in the first patch) to identify the transactions holding
the locks, and VirtualXactLock() to wait for each transaction to
commit or roll back. That would be more complicated for the client but
could be more broadly useful. I could investigate that further if it
seems preferable.
=== Example ===
Assume we have the following table:
CREATE TABLE page_views (
id bigserial,
view_time timestamptz
);
which is only ever modified by (potentially concurrent) INSERT
commands that assign the default value to the id column. We can run
the following commands:
SELECT pg_sequence_last_value('page_views_id_seq');
pg_sequence_last_value
------------------------
4
SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);
Now we know that all rows where id <= 4 have been committed or rolled
back, and we can observe/process them:
SELECT * FROM page_views WHERE id <= 4;
id | view_time
----+-------------------------------
2 | 2024-01-01 12:34:01.000000-00
3 | 2024-01-01 12:34:00.000000-00
Later we can iterate:
SELECT pg_sequence_last_value('page_views_id_seq');
pg_sequence_last_value
------------------------
9
SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);
We already observed all the rows where id <= 4, so this time we can
filter them out:
SELECT * FROM page_views WHERE id > 4 AND id <= 9;
id | view_time
----+-------------------------------
5 | 2024-01-01 12:34:05.000000-00
8 | 2024-01-01 12:34:04.000000-00
9 | 2024-01-01 12:34:07.000000-00
We can continue iterating like this to incrementally observe more
newly inserted rows. Note that the only state we persist across
iterations is the value returned by pg_sequence_last_value().
In this example, we processed inserted rows exactly once. Variations
are possible for handling updates, as discussed in the original email,
and I could explain that again better if it would be helpful. :-)
Attachment | Content-Type | Size |
---|---|---|
v12-0002-Allow-specifying-single-lockmode-in-WaitForLocke.patch | application/octet-stream | 7.8 KB |
v12-0001-Refactor-GetLockConflicts-into-more-general-GetL.patch | application/octet-stream | 11.3 KB |
v12-0003-Add-pg_wait_for_lockers-function.patch | application/octet-stream | 27.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Will Mortensen | 2024-05-30 07:10:57 | Re: Exposing the lock manager's WaitForLockers() to SQL |
Previous Message | vignesh C | 2024-05-30 06:43:41 | Re: Improving the latch handling between logical replication launcher and worker processes. |