Re: Add LSN <-> time conversion functionality

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ilya Kosmodemiansky <hydrobiont(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Add LSN <-> time conversion functionality
Date: 2024-08-01 01:12:15
Message-ID: CAAKRu_Z+skOOdx4cESzmPT-74v_LHs8OGuav9FUBUrm9kfEO8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review! v6 attached.

On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> PgStartLSN = GetXLogInsertRecPtr();
> Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crash recovery of 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted?

So, I don't think we will allow use of the LSNTimeStream on a standby,
since it is unclear what it would mean on a standby. For example, do
you want to know the time the LSN was generated or the time it was
replayed? Note that bgwriter won't insert values to the time stream on
a standby (it explicitly checks).

But, you bring up an issue that I don't quite know what to do about.
If the standby doesn't have an LSNTimeStream, then when it is
promoted, LSN <-> time conversions of LSNs and times before the
promotion seem impossible. Maybe if the stats file is getting written
out at checkpoints, we could restore from that previous primary's file
after promotion?

This brings up a second issue, which is that, currently, bgwriter
won't insert into the time stream when wal_level is minimal. I'm not
sure exactly how to resolve it because I am relying on the "last
important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle
when the bgwriter actually inserts new records into the LSNTimeStream.
I could come up with a different timeout interval for updating the
time stream. Perhaps I should do that?

> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And the function may return negative result, despite claiming area as a result. It’s intended, but a little misleading.

Ah, great point. I've fixed this.

> i-- > 0
> Is there a point to do a backward count in the loop?
> Consider dropping not one by one, but half of a stream, LSNTimeStream is ~2Kb of cache and it’s loaded as a whole to the cache..

Yes, the backwards looping was super confusing. It was a relic of my
old design. Even without your point about cache locality, the code is
much harder to understand with the backwards looping. I've changed the
array to fill forwards and be accessed with forward loops.

> > On 27 Jun 2024, at 07:18, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> >
> >> 2. Some benchmarks to proof the patch does not have CPU footprint.
> >
> > This is still a todo. Typically when designing a benchmark like this,
> > I would want to pick a worst-case workload to see how bad it could be.
> > I wonder if just a write heavy workload like pgbench builtin tpcb-like
> > would be sufficient?
>
> Increasing background writer activity to maximum and not seeing LSNTimeStream function in `perf top` seems enough to me.

I've got this on my TODO.

> >> Tests fail on Windows.
> >
> > I think this was because of the compiler warnings, but I need to
> > double-check now.
> Nope, it really looks more serious.
> [12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(433): error C2375: 'pg_estimate_lsn_at_time': redefinition; different linkage

Ah, yes. I mistakenly added the functions to pg_proc.dat and also
called PG_FUNCTION_INFO_V1 for the functions. I've fixed it.

> >> The patch lacks tests.
> >
> > I thought about this a bit. I wonder what kind of tests make sense.
> >
> > I could
> > 1) Add tests with the existing stats tests
> > (src/test/regress/sql/stats.sql) and just test that bgwriter is in
> > fact adding to the time stream.
> >
> > 2) Or should I add some infrastructure to be able to create an
> > LSNTimeStream and then insert values to it and do some validations of
> > what is added? I did a version of this but it is just much more
> > annoying with C & SQL than with python (where I tried out my
> > algorithms) [2].
>
> I think just a test which calls functions and discards the result would greatly increase coverage.

I've added tests of the two main conversion functions. I didn't add a
test of the function which gets the whole stream (pg_lsntime_stream())
because I don't think I can guarantee it won't be empty -- so I'm not
sure what I could do with it in a test.

> > On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > change the user-facing functions for estimating an
> > LSN/time conversion to instead return a floor and a ceiling -- instead
> > of linearly interpolating a guess. This would be a way to keep users
> > from misunderstanding the accuracy of the functions to translate LSN
> > <-> time.
>
> I think this is a good idea. And it covers well “server restart problem”. If API just returns -inf as a boundary, caller can correctly interpret this situation.

Providing "ceiling" and "floor" user functions is still a TODO for me,
however, I think that the patch mostly does handle server restarts.

In the event of a restart, the cumulative stats system will have
persisted our time stream, so the LSNTimeStream will just be read back
in with the rest of the stats. I've added logic to ensure that if the
PgStartLSN is newer than our oldest LSNTimeStream entry, we use the
oldest entry instead of PgStartLSN when doing conversions LSN <->
time.

As for a crash, stats do not persist crashes, but I think Michael's
patch will go in to write out the stats file at checkpoints, and then
this should be good enough.

Is there anything else you think that is an issue with restarts?

> Thanks! Looking forward to more timely freezing.

Thanks! I'll be posting a new version of the opportunistic freezing
patch that uses the time stream quite soon, so I hope you'll take a
look at that as well!

- Melanie

Attachment Content-Type Size
v6-0005-Add-time-LSN-translation-functions.patch text/x-patch 8.1 KB
v6-0001-Record-LSN-at-postmaster-startup.patch text/x-patch 2.6 KB
v6-0003-Add-LSNTimeStream-to-PgStat_WalStats.patch text/x-patch 1.8 KB
v6-0002-Add-LSNTimeStream-for-converting-LSN-time.patch text/x-patch 13.0 KB
v6-0004-Bgwriter-maintains-global-LSNTimeStream.patch text/x-patch 1.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-08-01 01:59:11 RE: Remove duplicate table scan in logical apply worker and code refactoring
Previous Message Thomas Munro 2024-08-01 00:33:42 Re: Remove last traces of HPPA support