Re: CSN snapshots in hot standby

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CSN snapshots in hot standby
Date: 2024-08-13 21:15:01
Message-ID: CALdSSPjC+YvJe-YnC2K-7SdT3vE8T7MASRfG9XshDfoOXVANwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 14 Aug 2024 at 01:13, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 05/04/2024 13:49, Andrey M. Borodin wrote:
> >> On 5 Apr 2024, at 02:08, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> Thanks for taking a look, Kirill!
>
> >> maybe we need some hooks here? Or maybe, we can take CSN here from extension somehow.
> >
> > I really like the idea of CSN-provider-as-extension.
> > But it's very important to move on with CSN, at least on standby, to make CSN actually happen some day.
> > So, from my perspective, having LSN-as-CSN is already huge step forward.
>
> Yeah, I really don't want to expand the scope of this.
>
> Here's a new version. Rebased, and lots of comments updated.
>
> I added a tiny cache of the CSN lookups into SnapshotData, which can
> hold the values of 4 XIDs that are known to be visible to the snapshot,
> and 4 invisible XIDs. This is pretty arbitrary, but the idea is to have
> something very small to speed up the common cases that 1-2 XIDs are
> repeatedly looked up, without adding too much overhead.
>
>
> I did some performance testing of the visibility checks using these CSN
> snapshots. The tests run SELECTs with a SeqScan in a standby, over a
> table where all the rows have xmin/xmax values that are still
> in-progress in the primary.
>
> Three test scenarios:
>
> 1. large-xact: one large transaction inserted all the rows. All rows
> have the same XMIN, which is still in progress
>
> 2. many-subxacts: one large transaction inserted each row in a separate
> subtransaction. All rows have a different XMIN, but they're all
> subtransactions of the same top-level transaction. (This causes the
> subxids cache in the proc array to overflow)
>
> 3. few-subxacts: All rows are inserted, committed, and vacuum frozen.
> Then, using 10 in separate subtransactions, DELETE the rows, in an
> interleaved fashion. The XMAX values cycle like this "1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 1, 2, 3, 4, 5, ...". The point of this is that these
> sub-XIDs fit in the subxids cache in the procarray, but the pattern
> defeats the simple 4-element cache that I added.
>
> The test script I used is attached. I repeated it a few times with
> master and the patches here, and picked the fastest runs for each. Just
> eyeballing the results, there's about ~10% variance in these numbers.
> Smaller is better.
>
> Master:
>
> large-xact: 4.57732510566711
> many-subxacts: 18.6958119869232
> few-subxacts: 16.467698097229
>
> Patched:
>
> large-xact: 10.2999930381775
> many-subxacts: 11.6501438617706
> few-subxacts: 19.8457028865814
>
> With cache:
>
> large-xact: 3.68792295455933
> many-subxacts: 13.3662350177765
> few-subxacts: 21.4426419734955
>
> The 'large-xacts' results show that the CSN lookups are slower than the
> binary search on the 'xids' array. Not a surprise. The 4-element cache
> fixes the regression, which is also not a surprise.
>
> The 'many-subxacts' results show that the CSN lookups are faster than
> the current method in master, when the subxids cache has overflowed.
> That makes sense: on master, we always perform a lookup in pg_subtrans,
> if the suxids cache has overflowed, which is more or less the same
> overhead as the CSN lookup. But we avoid the binary search on the xids
> array after that.
>
> The 'few-subxacts' shows a regression, when the 4-element cache is not
> effective. I think that's acceptable, the CSN approach has many
> benefits, and I don't think this is a very common scenario. But if
> necessary, it could perhaps be alleviated with more caching, or by
> trying to compensate by optimizing elsewhere.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)

Thanks for the update. I will try to find time for perf-testing this.
Firstly, random suggestions. Sorry for being too nit-picky

1) in 0002
> +/*
> + * Number of shared CSNLog buffers.
> + */
> +static Size
> +CSNLogShmemBuffers(void)
> +{
> + return Min(32, Max(16, NBuffers / 512));
> +}

Should we GUC this?

2) In 0002 CSNLogShmemInit:

> + //SlruPagePrecedesUnitTests(CsnlogCtl, SUBTRANS_XACTS_PER_PAGE);

remove this?

3) In 0002 InitCSNLogPage:

> + SimpleLruZeroPage(CsnlogCtl, pageno);
we can use ZeroCSNLogPage here. This will justify existance of this
function a little bit more.

4) In 0002:
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -27,7 +27,7 @@
> * removed. This is achieved by using the replication slot mechanism.
> *
> * As the percentage of transactions modifying the catalog normally is fairly
> - * small in comparisons to ones only manipulating user data, we keep track of
> + * small in comparison to ones only manipulating user data, we keep track of
> * the committed catalog modifying ones inside [xmin, xmax) instead of keeping
> * track of all running transactions like it's done in a normal snapshot. Note
> * that we're generally only looking at transactions that have acquired an

This change is unrelated to 0002 patch, let's just push it as a separate change.

Overall, 0002 looks straightforward, though big. I however wonder how
we can test that this change does not lead to any unpleasant problem,
like observing uncommitted changes on replicas, corruption, and other
stuff? Maybe some basic injection-point-based TAP test here is
desirable?

--
Best regards,
Kirill Reshke

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-13 21:30:46 Re: Restart pg_usleep when interrupted
Previous Message Jacob Champion 2024-08-13 21:11:56 Re: [PoC] Federated Authn/z with OAUTHBEARER