From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Eager page freeze criteria clarification |
Date: | 2023-10-12 15:50:19 |
Message-ID: | CAAKRu_Z-wRWCFc8-8iA5ZS_yh-80a+ZosGW60mvG_fBdWHrRSQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 11, 2023 at 8:43 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Robert, Melanie and I spent an evening discussing this topic around
> pgconf.nyc. Here are, mildly revised, notes from that:
Thanks for taking notes!
> The main thing we are worried about is repeated freezing / unfreezing of
> pages within a relatively short time period.
>
> - Computing an average "modification distance" as I (Andres) proposed efor
> each page is complicated / "fuzzy"
>
> The main problem is that it's not clear how to come up with a good number
> for workloads that have many more inserts into new pages than modifications
> of existing pages.
>
> It's also hard to use average for this kind of thing, e.g. in cases where
> new pages are frequently updated, but also some old data is updated, it's
> easy for the updates to the old data to completely skew the average, even
> though that shouldn't prevent us from freezing.
>
> - We also discussed an idea by Robert to track the number of times we need to
> dirty a page when unfreezing and to compare that to the number of pages
> dirtied overall (IIRC), but I don't think we really came to a conclusion
> around that - and I didn't write down anything so this is purely from
> memory.
I was under the impression that we decided we still had to consider
the number of clean pages dirtied as well as the number of pages
unfrozen. The number of pages frozen and unfrozen over a time period
gives us some idea of if we are freezing the wrong pages -- but it
doesn't tell us if we are freezing the right pages. A riff on an
earlier example by Robert:
While vacuuming a relation, we freeze 100 pages. During the same time
period, we modify 1,000,000 previously clean pages. Of these 1,000,000
pages modified, 90 were frozen. So we unfroze 90% of the pages frozen
during this time. Does this mean we should back off of trying to
freeze any pages in the relation?
> A rough sketch of a freezing heuristic:
...
> - Attributing "unfreezes" to specific vacuums would be powerful:
>
> - "Number of pages frozen during vacuum" and "Number of pages unfrozen that
> were frozen during the same vacuum" provides numerator / denominator for
> an "error rate"
>
> - We can perform this attribution by comparing the page LSN with recorded
> start/end LSNs of recent vacuums
While implementing a rough sketch of this, I realized I had a question
about this.
vacuum 1 starts at lsn 10 and ends at lsn 200. It froze 100 pages.
vacuum 2 then starts at lsn 600.
5 frozen pages with page lsn > 10 and < 200 were updated. We count
those in vacuum 1's stats. 3 frozen pages with page lsn > 200 and <
600 were updated. Do we count those somewhere?
> - This approach could provide "goals" for opportunistic freezing in a
> somewhat understandable way. E.g. aiming to rarely unfreeze data that has
> been frozen within 1h/1d/...
Similar to the above question, if we are tracking pages frozen and
unfrozen during a time period, if there are many vacuums in quick
succession, we might care if a page was frozen by one vacuum and then
unfrozen during a subsequent vacuum if not too much time has passed.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-10-12 15:54:05 | Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound |
Previous Message | Tom Lane | 2023-10-12 15:44:09 | Re: Performance degradation on concurrent COPY into a single relation in PG16. |