Re: Add LSN <-> time conversion functionality

From: Andrey M(dot) Borodin <x4mmm(at)yandex-team(dot)ru>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ilya Kosmodemiansky <hydrobiont(at)gmail(dot)com>
Subject: Re: Add LSN <-> time conversion functionality
Date: 2024-07-06 17:36:17
Message-ID: 38F43C9F-C270-4AA4-8BB6-392D4BE75F90@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I’m doing another iteration over the patchset.

PgStartLSN = GetXLogInsertRecPtr();
Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crash recovery of 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted?

lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And the function may return negative result, despite claiming area as a result. It’s intended, but a little misleading.

i-- > 0
Is there a point to do a backward count in the loop?
Consider dropping not one by one, but half of a stream, LSNTimeStream is ~2Kb of cache and it’s loaded as a whole to the cache..

> On 27 Jun 2024, at 07:18, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>
>> 2. Some benchmarks to proof the patch does not have CPU footprint.
>
> This is still a todo. Typically when designing a benchmark like this,
> I would want to pick a worst-case workload to see how bad it could be.
> I wonder if just a write heavy workload like pgbench builtin tpcb-like
> would be sufficient?

Increasing background writer activity to maximum and not seeing LSNTimeStream function in `perf top` seems enough to me.

>
>> === Nits ===
>> "Timeline" term is already taken.
>
> I changed it to LSNTimeStream. What do you think?
Sounds good to me.

>
>> Tests fail on Windows.
>
> I think this was because of the compiler warnings, but I need to
> double-check now.
Nope, it really looks more serious.
[12:31:25.701] FAILED: src/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj
[12:31:25.701] "cl" "-Isrc\backend\postgres_lib.a.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "/MDd" "/FIpostgres_pch.h" "/Yupostgres_pch.h" "/Fpsrc\backend\postgres_lib.a.p\postgres_pch.pch" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" "/FS" "/FdC:\cirrus\build\src\backend\postgres_lib.pdb" /Fosrc/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj "/c" ../src/backend/utils/activity/pgstat_wal.c
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(433): error C2375: 'pg_estimate_lsn_at_time': redefinition; different linkage
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2906): note: see declaration of 'pg_estimate_lsn_at_time'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(434): error C2375: 'pg_estimate_time_at_lsn': redefinition; different linkage
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2905): note: see declaration of 'pg_estimate_time_at_lsn'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(435): error C2375: 'pg_lsntime_stream': redefinition; different linkage
[12:31:25.858] c:\cirrus\build\src\include\utils/fmgrprotos.h(2904): note: see declaration of 'pg_lsntime_stream'

>
>> The patch lacks tests.
>
> I thought about this a bit. I wonder what kind of tests make sense.
>
> I could
> 1) Add tests with the existing stats tests
> (src/test/regress/sql/stats.sql) and just test that bgwriter is in
> fact adding to the time stream.
>
> 2) Or should I add some infrastructure to be able to create an
> LSNTimeStream and then insert values to it and do some validations of
> what is added? I did a version of this but it is just much more
> annoying with C & SQL than with python (where I tried out my
> algorithms) [2].

I think just a test which calls functions and discards the result would greatly increase coverage.

> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> change the user-facing functions for estimating an
> LSN/time conversion to instead return a floor and a ceiling -- instead
> of linearly interpolating a guess. This would be a way to keep users
> from misunderstanding the accuracy of the functions to translate LSN
> <-> time.

I think this is a good idea. And it covers well “server restart problem”. If API just returns -inf as a boundary, caller can correctly interpret this situation.

Thanks! Looking forward to more timely freezing.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Wienhold 2024-07-06 18:24:35 Re: XML test error on Arch Linux
Previous Message Dean Rasheed 2024-07-06 15:36:41 Simplifying width_bucket_numeric()