Re: CSN snapshots in hot standby

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CSN snapshots in hot standby
Date: 2024-08-13 20:13:39
Message-ID: b439edfc-c5e5-43a9-802d-4cb51ec20646@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/04/2024 13:49, Andrey M. Borodin wrote:
>> On 5 Apr 2024, at 02:08, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:

Thanks for taking a look, Kirill!

>> maybe we need some hooks here? Or maybe, we can take CSN here from extension somehow.
>
> I really like the idea of CSN-provider-as-extension.
> But it's very important to move on with CSN, at least on standby, to make CSN actually happen some day.
> So, from my perspective, having LSN-as-CSN is already huge step forward.

Yeah, I really don't want to expand the scope of this.

Here's a new version. Rebased, and lots of comments updated.

I added a tiny cache of the CSN lookups into SnapshotData, which can
hold the values of 4 XIDs that are known to be visible to the snapshot,
and 4 invisible XIDs. This is pretty arbitrary, but the idea is to have
something very small to speed up the common cases that 1-2 XIDs are
repeatedly looked up, without adding too much overhead.

I did some performance testing of the visibility checks using these CSN
snapshots. The tests run SELECTs with a SeqScan in a standby, over a
table where all the rows have xmin/xmax values that are still
in-progress in the primary.

Three test scenarios:

1. large-xact: one large transaction inserted all the rows. All rows
have the same XMIN, which is still in progress

2. many-subxacts: one large transaction inserted each row in a separate
subtransaction. All rows have a different XMIN, but they're all
subtransactions of the same top-level transaction. (This causes the
subxids cache in the proc array to overflow)

3. few-subxacts: All rows are inserted, committed, and vacuum frozen.
Then, using 10 in separate subtransactions, DELETE the rows, in an
interleaved fashion. The XMAX values cycle like this "1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 1, 2, 3, 4, 5, ...". The point of this is that these
sub-XIDs fit in the subxids cache in the procarray, but the pattern
defeats the simple 4-element cache that I added.

The test script I used is attached. I repeated it a few times with
master and the patches here, and picked the fastest runs for each. Just
eyeballing the results, there's about ~10% variance in these numbers.
Smaller is better.

Master:

large-xact: 4.57732510566711
many-subxacts: 18.6958119869232
few-subxacts: 16.467698097229

Patched:

large-xact: 10.2999930381775
many-subxacts: 11.6501438617706
few-subxacts: 19.8457028865814

With cache:

large-xact: 3.68792295455933
many-subxacts: 13.3662350177765
few-subxacts: 21.4426419734955

The 'large-xacts' results show that the CSN lookups are slower than the
binary search on the 'xids' array. Not a surprise. The 4-element cache
fixes the regression, which is also not a surprise.

The 'many-subxacts' results show that the CSN lookups are faster than
the current method in master, when the subxids cache has overflowed.
That makes sense: on master, we always perform a lookup in pg_subtrans,
if the suxids cache has overflowed, which is more or less the same
overhead as the CSN lookup. But we avoid the binary search on the xids
array after that.

The 'few-subxacts' shows a regression, when the 4-element cache is not
effective. I think that's acceptable, the CSN approach has many
benefits, and I don't think this is a very common scenario. But if
necessary, it could perhaps be alleviated with more caching, or by
trying to compensate by optimizing elsewhere.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
v2-0001-Update-outdated-comment-on-WAL-logged-locks-with-.patch text/x-patch 1.7 KB
v2-0002-XXX-add-perf-test.patch text/x-patch 5.6 KB
v2-0003-Use-CSN-snapshots-during-Hot-Standby.patch text/x-patch 128.8 KB
v2-0004-Make-SnapBuildWaitSnapshot-work-without-xl_runnin.patch text/x-patch 6.2 KB
v2-0005-Remove-the-now-unused-xids-array-from-xl_running_.patch text/x-patch 7.0 KB
v2-0006-Add-a-small-cache-to-Snapshot-to-avoid-CSN-lookup.patch text/x-patch 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-08-13 20:54:24 Re: PG_TEST_EXTRA and meson
Previous Message Peter Eisentraut 2024-08-13 20:13:27 Re: Improve error message for ICU libraries if pkg-config is absent