From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Weird failure with latches in curculio on v15 |
Date: | 2023-02-09 19:29:52 |
Message-ID: | 20230209192952.jvx56yuutlxuvjjf@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-02-09 11:12:21 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2023 at 10:51 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I'm fairly concerned about the idea of making it common for people
> > to write their own main loop for the archiver. That means that, if
> > we have a bug fix that requires the archiver to do X, we will not
> > just be patching our own code but trying to get an indeterminate
> > set of third parties to add the fix to their code.
I'm somewhat concerned about that too, but perhaps from a different
angle. First, I think we don't do our users a service by defaulting the
in-core implementation to something that doesn't scale to even a moderately
busy server. Second, I doubt we'll get the API for any of this right, without
an acutual user that does something more complicated than restoring one-by-one
in a blocking manner.
> I don't know what kind of bug we could really have in the main loop
> that would be common to every implementation. They're probably all
> going to check for interrupts, do some work, and then wait for I/O on
> some things by calling select() or some equivalent. But the work, and
> the wait for the I/O, would be different for every implementation. I
> would anticipate that the amount of common code would be nearly zero.
I don't think it's that hard to imagine problems. To be reasonably fast, a
decent restore implementation will have to 'restore ahead'. Which also
provides ample things to go wrong. E.g.
- WAL source is switched, restore module needs to react to that, but doesn't,
we end up lots of wasted work, or worse, filename conflicts
- recovery follows a timeline, restore module doesn't catch on quickly enough
- end of recovery happens, restore just continues on
> > If we think we need primitives to let the archiver hooks get all
> > the pending files, or whatever, by all means add those. But don't
> > cede fundamental control of the archiver. The hooks need to be
> > decoration on a framework we provide, not the framework themselves.
>
> I don't quite see how you can make asynchronous and parallel archiving
> work if the archiver process only calls into the archive module at
> times that it chooses. That would mean that the module has to return
> control to the archiver when it's in the middle of archiving one or
> more files -- and then I don't see how it can get control back at the
> appropriate time. Do you have a thought about that?
I don't think archiver is the hard part, that already has a dedicated
process, and it also has something of a queuing system already. The startup
process imo is the complicated one...
If we had a 'restorer' process, startup fed some sort of a queue with things
to restore in the near future, it might be more realistic to do something you
describe?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-02-09 19:39:17 | Re: recovery modules |
Previous Message | Nathan Bossart | 2023-02-09 19:29:09 | Re: Improve logging when using Huge Pages |