Re: Assertion failure with summarize_wal enabled during pg_createsubscriber

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Assertion failure with summarize_wal enabled during pg_createsubscriber
Date: 2024-07-31 13:49:54
Message-ID: CAPpHfdtT=xAwEQL-fgRLz7vGXABFnnFReienNBJmfPwRZP8-NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 29, 2024 at 7:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jul 26, 2024 at 4:10 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > 0002 could also use pg_indent and pgperltidy. And I have couple other
> > notes regarding 0002.
>
> As Tom said elsewhere, we don't have a practice of requiring perltidy
> for every commit, and I normally don't bother. The tree is
> pgindent-clean currently so I believe I got that part right.

Got it, thanks.

> > > In the process of fixing these bugs, I realized that the logic to wait
> > > for WAL summarization to catch up was spread out in a way that made
> > > it difficult to reuse properly, so this code refactors things to make
> > > it easier.
> >
> > It would be nice to split refactoring out of material logic changed.
> > This way it would be easier to review and would look cleaner in the
> > git history.
>
> I support that idea in general but felt it was overkill in this case:
> it's new code, and there was only one existing caller of the function
> that got refactored, and I'm not a huge fan of cluttering the git
> history with a bunch of tiny little refactoring commits to fix a
> single bug. I might have changed it if I'd seen this note before
> committing, though.

I understand your point. I'm also not huge fan of a flood of small
commits. Nevertheless, I find splitting refactoring from other
changes generally useful. That could be a single commit of many small
refactorings, not many small commits. The point for me is easier
review: you can expect refactoring commit to contain "isomorphic"
changes, while other commits implementing material logic changes. But
that might be a committer preference though.

> > > To make this fix work, also teach the WAL summarizer that after a
> > > promotion has occurred, no more WAL can appear on the previous
> > > timeline: previously, the WAL summarizer wouldn't switch to the new
> > > timeline until we actually started writing WAL there, but that meant
> > > that when the startup process was waiting for the WAL summarizer, it
> > > was waiting for an action that the summarizer wasn't yet prepared to
> > > take.
> >
> > I think this is a pretty long sentence, and I'm not sure I can
> > understand it correctly. Does startup process wait for the WAL
> > summarizer without this patch? If not, it's not clear for me that
> > word "previously" doesn't distribute to this part of sentence.
> > Breaking this into multiple sentences could improve the clarity for
> > me.
>
> Yes, I think that phrasing is muddled. It's too late to amend the
> commit message, but for posterity: I initially thought that I could
> fix this bug by just teaching the startup process to wait for WAL
> summarization before performing the .partial renaming, but that was
> not enough by itself. The problem is that the WAL summarizer would
> read all the records that were present in the final WAL file on the
> old timeline, but it wouldn't actually write out a summary file,
> because that only happens when it reaches XLOG_CHECKPOINT_REDO or a
> timeline switch point. Since no WAL had appeared on the new timeline
> yet, it didn't view the end of the old timeline as a switch point, so
> it just sat there waiting, without ever writing out a file; and the
> startup process sat there waiting for it. So the second part of the
> fix is to make the WAL summarizer realize that once the startup
> process has chosen a new timeline ID, no more WAL is going to appear
> on the old timeline, and thus it can (and should) write out the final
> summary file for the old timeline and prepare to read WAL from the new
> timeline.

Great, thank you for the explanation. Now that's clear.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-07-31 13:50:33 Re: Proposal: Document ABI Compatibility
Previous Message Daniel Verite 2024-07-31 13:42:41 Re: Fixing backslash dot for COPY FROM...CSV