Re: Eager page freeze criteria clarification

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-12-21 15:56:09
Message-ID: CAAKRu_YfyOUK8Ne9=6CrqiNPNTfsP76-Gmcv-0p=KQiN1nM14A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
>
> On 12/8/23 23:11, Melanie Plageman wrote:
> >
> > I'd be delighted to receive any feedback, ideas, questions, or review.
>
>
> This is well thought out, well described, and a fantastic improvement in
> my view -- well done!

Thanks, Joe! That means a lot! I see work done by hackers on the
mailing list a lot that makes me think, "hey, that's
cool/clever/awesome!" but I don't give that feedback. I appreciate you
doing that!

> I do think we will need to consider distributions other than normal, but
> I don't know offhand what they will be.

Agreed. I plan to test with another distribution. Though, the exercise
of determining which ones are useful is probably more challenging.
I imagine we will have to choose one distribution (as opposed to
supporting different distributions and choosing based on data access
patterns for a table). Though, even with a normal distribution, I
think it should be an improvement.

> However, even if we assume a more-or-less normal distribution, we should
> consider using subgroups in a way similar to Statistical Process
> Control[1]. The reasoning is explained in this quote:
>
> The Math Behind Subgroup Size
>
> The Central Limit Theorem (CLT) plays a pivotal role here. According
> to CLT, as the subgroup size (n) increases, the distribution of the
> sample means will approximate a normal distribution, regardless of
> the shape of the population distribution. Therefore, as your
> subgroup size increases, your control chart limits will narrow,
> making the chart more sensitive to special cause variation and more
> prone to false alarms.

I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?

It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be. The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?

Thanks for the feedback!

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-12-21 15:57:56 Re: ci: Build standalone INSTALL file
Previous Message Tom Lane 2023-12-21 15:46:02 Re: ci: Build standalone INSTALL file