From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se> |
Subject: | Re: Online enabling of checksums |
Date: | 2018-03-02 23:26:01 |
Message-ID: | 81bfd944-2f44-39f6-a4de-7b3876ac1455@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/02/2018 11:01 PM, Magnus Hagander wrote:
> On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
>
>
> On 03/02/2018 02:35 PM, Magnus Hagander wrote:
> >
> >
> > On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>
> > <mailto:robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>>> wrote:
> >
> > On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
> > <magnus(at)hagander(dot)net <mailto:magnus(at)hagander(dot)net>
> <mailto:magnus(at)hagander(dot)net <mailto:magnus(at)hagander(dot)net>>> wrote:
> > > Also if that wasn't clear -- we only do the full page write if there isn't
> > > already a checksum on the page and that checksum is correct.
> >
> > Hmm.
> >
> > Suppose that on the master there is a checksum on the page and that
> > checksum is correct, but on the standby the page contents differ in
> > some way that we don't always WAL-log, like as to hint bits, and there
> > the checksum is incorrect. Then you'll enable checksums when the
> > standby still has some pages without valid checksums, and disaster
> > will ensue.
> >
> > I think this could be hard to prevent if checksums are turned on and
> > off multiple times.
> >
> >
> > Do we ever make hintbit changes on the standby for example? If so, it
> > would definitely cause problems. I didn't realize we did, actually...
> >
>
> I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
> AFAIK that can't be true on a standby.
>
> > I guess we could get there even if we don't by:
> > * All checksums are correct
> > * Checkums are disabled (which replicates)
> > * Non-WAL logged change on the master, which updates checksum but does
> > *not* replicate
> > * Checksums re-enabled
> > * Worker sees the checksum as correct, and thus does not force a full
> > page write.
> > * Worker completes and flips checksums on which replicates. At this
> > point, if the replica reads the page, boom.
> >
>
> Maybe.
>
> My understanding of Robert's example is that you can start with an
> instance that has wal_log_hints=off, and so pages on master/standby may
> not be 100% identical. Then we do the online checksum thing, and the
> standby may get pages with incorrect checksums.
>
>
> No, in that case the master will issue full page writes for *all* pages,
> since they dind't hvae a checksum. The current patch only avoids doing
> that if the checksum on the master is correct, which it isn't when you
> start from checksums=off. So this particular problem only shows up if
> you iterate between off/on/off multiple times.
>
Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.
>
> > I guess we have to remove that optimisation. It's definitely a
> > bummer, but I don't think it's an absolute dealbreaker.
> >
>
> I agree it's not a deal-breaker. Or at least I don't see why it should
> be - any other maintenance activity on the database (freezing etc.) will
> also generate full-page writes.
>
>
> Yes.
>
>
>
> The good thing is the throttling also limits the amount of WAL, so it's
> possible to prevent generating too many checkpoints etc.
>
> I suggest we simply:
>
> 1) set the checksums to in-progress
> 2) wait for a checkpoint
> 3) use the regular logic for full-pages (i.e. first change after
> checkpoint does a FPW)
>
>
> This is very close to what it does now, except it does not wait for a
> checkpoint in #2. Why does it need that?
>
To guarantee that the page has a FPW with all the hint bits, before we
start messing with the checksums (or that setting the checksum itself
triggers a FPW).
>
> BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does
>
> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
> CHECKPOINT_IMMEDIATE);
>
> I'm rather unhappy about that - immediate checkpoints have massive
> impact on production systems, so we try not doing them (That's one of
> the reasons why CREATE DATABASE is somewhat painful). It usually
> requires a bit of thinking about when to do such commands. But in this
> case it's unpredictable when exactly the checksumming completes, so it
> may easily be in the middle of peak activity.
>
> Why not to simply wait for regular spread checkpoint, the way
> pg_basebackup does it?
>
>
> Actually, that was my original idea. I changed it for testing, and shuld
> go change it back.
>
OK
>
>
> > We could say that we keep the optimisation if wal_level=minimal for
> > example, because then we know there is no replica. But I doubt
> > that's worth it?
> >
>
> If it doesn't require a lot of code, why not? But I don't really see
> much point in doing that.
>
>
> Yeah, I doubt there are a lot of people using "minimal" these days, not
> since we changed the default.
>
Yeah. Although as I said, it depends on how much code would be needed to
enable that optimization (I guess not much). If someone is running with
wal_level=minimal intentionally, why not to help them.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-03-03 00:13:01 | Re: JIT compiling with LLVM v11 |
Previous Message | Andres Freund | 2018-03-02 23:14:19 | Re: Documenting commitfest Rules |