From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Page freezing, FSM, and WAL replay |
Date: | 2024-11-08 12:21:46 |
Message-ID: | CALdSSPiwW0556i5pvF_4x=AcFdqp0yjcgBLzPOB316Wr3TTUUg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 8 Nov 2024 at 17:11, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> We recently had a customer report a very strange problem, involving a
> very large insert-only table: without explanation, insertions would
> stall for several seconds, causing application timeout and process
> accumulation and other nastiness.
>
> After some investigation, we narrowed this down to happening immediately
> after the first VACUUM on the table right after a standby got promoted.
> It wasn't at first obvious what the connection between these factors
> was, but eventually we realized that VACUUM must have been skipping a
> bunch of pages because they had been marked all-frozen previously, so
> the FSM was not updated with the correct freespace figures for those
> pages. The FSM pages had been transmitted as full-page images on WAL
> before the promotion (because wal_log_hints), so they contained
> optimistic numbers on amount of free space coming from the previous
> master. (Because this only happens on the first change to that FSM page
> after a checkpoint, it's quite likely that one page every few thousand
> or so contains optimistic figures while the others remain all zeroes, or
> something like that.)
>
> Before VACUUM, nothing too bad would happen, because the upper layers of
> the FSM would not know about those optimistic numbers. But when VACUUM
> does FreeSpaceMapVacuum, it propagates those numbers upwards; as soon as
> that happens, inserters looking for pages would be told about those
> pages (wrongly catalogued to contain sufficient free space), go to
> insert there, and fail because there isn't actually any freespace; ask
> FSM for another page, lather, rinse, repeat until all those pages are
> all catalogued correctly by FSM, at which point things continue
> normally. (There are many processes doing this chase-up concurrently
> and it seems a pretty contentious process, about which see last
> paragraph; it can be seen in pg_xlogdump that it takes several seconds
> for things to settle).
>
> After considering several possible solutions, I propose to have
> heap_xlog_visible compute free space for any page being marked frozen;
> Pavan adds to that to have heap_xlog_clean compute free space for all
> pages also. This means that if we later promote this standby and VACUUM
> skips all-frozen pages, their FSM numbers are going to be up-to-date
> anyway. Patch attached.
>
>
> Now, it's possible that the problem occurs for all-visible pages not
> just all-frozen. I haven't seen that one, maybe there's some reason why
> it cannot. But fixing both things together is an easy change in the
> proposed patch: just do it on xlrec->flags != 0 rather than checking for
> the specific all-frozen flag.
>
> (This problem seems to be made worse by the fact that
> RecordAndGetPageWithFreeSpace (or rather fsm_set_and_search) holds
> exclusive lock on the FSM page for the whole duration of update plus
> search. So when there are many inserters, they all race to the update
> process. Maybe it'd be less terrible if we would release exclusive
> after the update and grab shared lock for the search in
> fsm_set_and_search, but we still have to have the exclusive for the
> update, so the contention point remains. Maybe there's not sufficient
> improvement to make a practical difference, so I'm not proposing
> changing this.)
>
> --
> Álvaro Herrera
Hi!
Sorry for disturbing you after so much time. Today, while I was doing
my stuff and researching several FSM-related questions, I noticed that
the comment in the `heap_xlog_visible` function used improper
punctuation.
After some investigation, I conclude that this is an oversight of
ab7dbd6, which was proposed in this thread.
I'd like to propose a fix for that.
Sorry for making so much noise for this minor matter.
--
Best regards,
Kirill Reshke
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Fixup-FSM-comment-inside-heap_xlog_visible.patch | application/octet-stream | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-11-08 12:31:30 | Re: New "single" COPY format |
Previous Message | Alvaro Herrera | 2024-11-08 11:47:41 | Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY |