From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: when the startup process doesn't |
Date: | 2021-04-21 19:51:38 |
Message-ID: | 20210421195138.GQ20766@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2021-04-21 14:36:24 -0400, Stephen Frost wrote:
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > Unfortunately I think something like a percentage is hard to calculate
> > > right now. Even just looking at crash recovery (vs replication or
> > > PITR), we don't currently know where the WAL ends without reading all
> > > the WAL. The easiest thing to return would be something in LSNs or
> > > bytes and I suspect that we don't want to expose either unauthenticated?
> >
> > While it obviously wouldn't be exactly accurate, I wonder if we couldn't
> > just look at the WAL files we have to reply and then guess that we'll go
> > through about half of them before we reach the end..? I mean, wouldn't
> > exactly be the first time that a percentage progress report wasn't
> > completely accurate. :)
>
> I don't think that'd work well, due to WAL segment recycling. We rename
> WAL files into place when removing them, and sometimes that can be a
> *lot* of files. It's one thing for there to be a ~20% inaccuracy in
> estimated amount of work, another to have misestimates on the order of
> magnitudes.
I mean- we actively try to guess at how many WAL files we'll need during
each checkpoint and if we're doing that decently then it'd hopefully be
on about the order of half the files, as I suggested, that we'll end up
going through at any point in time. Naturally, it'll be different if
there's a forced checkpoint or a sudden spike of activity, but I'm not
sure that it's an entirely unreasonable place to start if we're going to
be guessing at it.
> > > I wonder if we ought to occasionally update something like
> > > ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> > > standbys? Then we could actually calculate a percentage, and it'd have
> > > the added advantage of allowing to detect more cases where the end of
> > > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> > > adding a lot of fsyncs, but that seems doable?
> >
> > This seems to go against Tom's concerns wrt rewriting pg_control.
>
> I don't think that concern equally applies for what I am proposing
> here. For one, we already have minRecoveryPoint in ControlData, and we
> already use it for the purpose of determining where we need to recover
> to, albeit only during crash recovery. Imo that's substantially
> different from adding actual recovery progress status information to the
> control file.
I agree that it's not the same as adding actual recovery progress status
information.
> I also think that it'd actually be a significant reliability improvement
> if we maintained an approximate minRecoveryPoint during normal running:
> I've seen way too many cases where WAL files were lost / removed and
> crash recovery just started up happily. Only hitting problems months
> down the line. Yes, it'd obviously not bullet proof, since we'd not want
> to add a significant stream of new fsyncs, but IME such WAL files
> lost/removed issues tend not to be about a few hundred bytes of WAL but
> many segments missing.
I do agree that it's definitely a problem and one that I've seen as well
where we think we reach the end of recovery even though we didn't
actually. Having a way to avoid that happening would be quite nice. It
does seem like we have some trade-offs here to weigh, but pg_control is
indeed quite small..
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-04-21 19:59:04 | Re: posgres 12 bug (partitioned table) |
Previous Message | Andres Freund | 2021-04-21 19:36:05 | Re: when the startup process doesn't |