Re: Add LSN <-> time conversion functionality

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ilya Kosmodemiansky <hydrobiont(at)gmail(dot)com>
Subject: Re: Add LSN <-> time conversion functionality
Date: 2024-08-09 17:03:02
Message-ID: f6885752-75ef-496f-a6cc-ad759feb907f@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/9/24 17:48, Melanie Plageman wrote:
> On Fri, Aug 9, 2024 at 9:15 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>>
>> On Fri, Aug 9, 2024 at 9:09 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>>
>>> I suggest we do the simplest and most obvious algorithm possible, at
>>> least for now. Focusing on this part seems like a distraction from the
>>> freezing thing you actually want to do.
>>
>> The simplest thing to do would be to pick an arbitrary point in the
>> past (say one week) and then throw out all the points (except the very
>> oldest to avoid extrapolation) from before that cliff. I would like to
>> spend time on getting a new version of the freezing patch on the list,
>> but I think Robert had strong feelings about having a complete design
>> first. I'll switch focus to that for a bit so that perhaps you all can
>> see how I am using the time -> LSN conversion and that could inform
>> the design of the data structure.
>
> I realize this thought didn't make much sense since it is a fixed size
> data structure. We would have to use some other algorithm to get rid
> of data if there are still too many points from within the last week.
>

Not sure I understand. Why would the fixed size of the struct mean we
can't discard too old data?

I'd imagine we simply reclaim some of the slots and mark them as unused,
"move" the data to make space for recent data, or something like that.
Or just use something like a cyclic buffer, that wraps around and
overwrites oldest data.

> In the adaptive freezing code, I use the time stream to answer a yes
> or no question. I translate a time in the past (now -
> target_freeze_duration) to an LSN so that I can determine if a page
> that is being modified for the first time after having been frozen has
> been modified sooner than target_freeze_duration (a GUC value). If it
> is, that page was unfrozen too soon. So, my use case is to produce a
> yes or no answer. It doesn't matter very much how accurate I am if I
> am wrong. I count the page as having been unfrozen too soon or I
> don't. So, it seems I care about the accuracy of data from now until
> now - target_freeze_duration + margin of error a lot and data before
> that not at all. While it is true that if I'm wrong about a page that
> was older but near the cutoff, that might be better than being wrong
> about a very recent page, it is still wrong.
>

Yeah. But isn't that a bit backwards? The decision can be wrong because
the estimate was too off, or maybe it was spot on and we still made a
wrong decision. That's what happens with heuristics.

I think a natural expectation is that the quality of the answers
correlates with the accuracy of the data / estimates. With accurate
results (say we keep a perfect history, with no loss of precision for
older data) we should be doing the right decision most of the time. If
not, it's a lost cause, IMHO. And with lower accuracy it'd get worse,
otherwise why would we need the detailed data.

But now that I think about it, I'm not entirely sure I understand what
point are you making :-(

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-08-09 17:24:46 Re: Add LSN <-> time conversion functionality
Previous Message Abdoulaye Ba 2024-08-09 16:59:37 Re: PATCH: Add hooks for pg_total_relation_size and pg_indexes_size