Quick Links

Re: Rework the way multixact truncations work

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Rework the way multixact truncations work
Date:	2015-12-03 09:38:45
Message-ID:	20151203093845.GA2032888@tornado.leadboat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> > Hackers have been too reticent to revert and redo defect-laden
> > commits. If doing that is weird today, let it be normal.
>
> Why?

See my paragraph ending with the two sentences you quoted.

> Especially if reverting and redoing includes conflicts that mainly
> increases the chance of accidental bugs.

True. (That doesn't apply to these patches.)

> > git remote add community git://git.postgresql.org/git/postgresql.git
> > git remote add nmisch_github https://github.com/nmisch/postgresql.git
> > git fetch --multiple community nmisch_github
> > git diff community/master...nmisch_github/mxt4-rm-legacy
>
> That's a nearly useless diff, because it includes a lot of other changes
> (218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
> you published the changes.

Perhaps you used "git diff a..b", not "git diff a...b". If not, please send
the outputs of "git rev-parse community/master nmisch_github/mxt4-rm-legacy"
and "git --version".

> > @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> > - /*
> > - * Computing the actual limits is only possible once the data directory is
> > - * in a consistent state. There's no need to compute the limits while
> > - * still replaying WAL - no decisions about new multis are made even
> > - * though multixact creations might be replayed. So we'll only do further
> > - * checks after TrimMultiXact() has been called.
> > - */
> > + /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
> > if (!MultiXactState->finishedStartup)
> > return;
> > -
> > Assert(!InRecovery);
> >
> > - /* Set limits for offset vacuum. */
> > + /*
> > + * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> > + * call, which is only possible once the data directory is in a consistent
> > + * state. There's no need for an offset limit while still replaying WAL;
> > + * no decisions about new multis are made even though multixact creations
> > + * might be replayed.
> > + */
> > needs_offset_vacuum = SetOffsetVacuumLimit();
>
> I don't really see the benefit of this change. The previous location of
> the comment is where we return early, so it seems appropriate to
> document the reason there?

I made that low-importance change for two reasons. First, returning at that
point skips more than just the setting a limit; it also skips autovacuum
signalling and wraparound warnings. Second, the function has just computed
mxid "actual limits", so branch mxt2-cosmetic made the comment specify that we
defer an offset limit, not any and all limits.

> > static bool
> > SetOffsetVacuumLimit(void)
> > {
> > - MultiXactId oldestMultiXactId;
> > + MultiXactId oldestMultiXactId;
> > MultiXactId nextMXact;
> > - MultiXactOffset oldestOffset = 0; /* placate compiler */
> > - MultiXactOffset prevOldestOffset;
> > + MultiXactOffset oldestOffset = 0; /* placate compiler */
> > MultiXactOffset nextOffset;
> > bool oldestOffsetKnown = false;
> > + MultiXactOffset prevOldestOffset;
> > bool prevOldestOffsetKnown;
> > - MultiXactOffset offsetStopLimit = 0;
>
> I don't see the benefit of the order changes here.

I reacted the same way. Commit 4f627f8 reordered some declarations, which I
reverted when I refinished that commit as branch mxt3-main.

> > - if (oldestOffsetKnown)
> > - ereport(DEBUG1,
> > - (errmsg("oldest MultiXactId member is at offset %u",
> > - oldestOffset)));
>
> That's imo a rather useful debug message.

The branches emit that message at the same times 4f627f8^ and earlier emit it.

> > - else
> > + if (!oldestOffsetKnown)
> > + {
> > + /* XXX This message is incorrect if prevOldestOffsetKnown. */
> > ereport(LOG,
> > (errmsg("MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk",
> > oldestMultiXactId)));
> > + }
> > }
>
> Hm, the XXX is a "problem" in all 9.3+ - should we just fix it everywhere?

I welcome a project to fix it.

> > LWLockRelease(MultiXactTruncationLock);
> >
> > /*
> > - * If we can, compute limits (and install them MultiXactState) to prevent
> > - * overrun of old data in the members SLRU area. We can only do so if the
> > - * oldest offset is known though.
> > + * There's no need to update anything if we don't know the oldest offset
> > + * or if it hasn't changed.
> > */
>
> Is that really a worthwhile optimization?

I would neither remove that longstanding optimization nor add it from scratch
today. Branch commit 06c9979 restored it as part of a larger restoration to
the pre-4f627f8 structure of SetOffsetVacuumLimit().

> > -typedef struct mxtruncinfo
> > -{
> > - int earliestExistingPage;
> > -} mxtruncinfo;
> > -
> > -/*
> > - * SlruScanDirectory callback
> > - * This callback determines the earliest existing page number.
> > - */
> > -static bool
> > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
> > -{
> > - mxtruncinfo *trunc = (mxtruncinfo *) data;
> > -
> > - if (trunc->earliestExistingPage == -1 ||
> > - ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> > - {
> > - trunc->earliestExistingPage = segpage;
> > - }
> > -
> > - return false; /* keep going */
> > -}
> > -
>
> That really seems like an independent change, deserving its own commit +
> explanation.

Indeed. I explained that change at length in
http://www.postgresql.org/message-id/20151108085407.GA1097830@tornado.leadboat.com,
including that it's alone on a branch (mxt1-disk-independent), to become its
own PostgreSQL commit.

> > [branch commit 89a7232]
>
> I don't think that's really a good idea - this way we restart after
> every single page write, whereas currently we only restart after passing
> through all pages once. In nearly all cases we'll only ever have to
> retry once in total, be because such old pages aren't usually going to
> be reread/redirtied.

Your improvement sounds fine, then. Would both SimpleLruTruncate() and
SlruDeleteSegment() benefit from it?

> > @@ -9216,10 +9212,8 @@ xlog_redo(XLogReaderState *record)
> > LWLockRelease(OidGenLock);
> > MultiXactSetNextMXact(checkPoint.nextMulti,
> > checkPoint.nextMultiOffset);
> > -
> > - MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > - checkPoint.oldestMultiDB);
> > SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
> > + SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
> >
> > /*
> > * If we see a shutdown checkpoint while waiting for an end-of-backup
> > @@ -9309,17 +9303,13 @@ xlog_redo(XLogReaderState *record)
> > LWLockRelease(OidGenLock);
> > MultiXactAdvanceNextMXact(checkPoint.nextMulti,
> > checkPoint.nextMultiOffset);
> > -
> > - /*
> > - * NB: This may perform multixact truncation when replaying WAL
> > - * generated by an older primary.
> > - */
> > - MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > - checkPoint.oldestMultiDB);
> > if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
> > checkPoint.oldestXid))
> > SetTransactionIdLimit(checkPoint.oldestXid,
> > checkPoint.oldestXidDB);
> > + MultiXactAdvanceOldest(checkPoint.oldestMulti,
> > + checkPoint.oldestMultiDB);
> > +
> > /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
> > ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
> > ControlFile->checkPointCopy.nextXid =
> > checkPoint.nextXid;
>
> Why?

master does not and will not have legacy truncation, so the deleted comment
does not belong in master. Regarding the SetMultiXactIdLimit() call:

commit 611a2ec
Author: Noah Misch <noah(at)leadboat(dot)com>
AuthorDate: Sat Nov 7 15:06:28 2015 -0500
Commit: Noah Misch <noah(at)leadboat(dot)com>
CommitDate: Sat Nov 7 15:06:28 2015 -0500

In xlog_redo(), believe a SHUTDOWN checkPoint.oldestMulti exactly.

It was so before this branch. This restores consistency with the
handling of nextXid, nextMulti and oldestMulti: we treat them as exact
for XLOG_CHECKPOINT_SHUTDOWN and as minima for XLOG_CHECKPOINT_ONLINE.
I do not know of a case where this definitely matters for any of these
counters. It might matter if a bug causes oldestXid to move forward
wrongly, causing it to then move backward later. (I don't know if
VACUUM does ever move oldestXid backward, but it's a plausible thing to
do if on-disk state fully agrees with an older value.) That example has
no counterpart for oldestMultiXactId, because any update first arrives
in an XLOG_MULTIXACT_TRUNCATE_ID record. Therefore, this commit is
probably cosmetic.

> > - /*
> > - * Truncate CLOG, multixact and CommitTs to the oldest computed value.
> > - */
> > TruncateCLOG(frozenXID);
> > TruncateCommitTs(frozenXID);
> > TruncateMultiXact(minMulti, minmulti_datoid);
>
> Why? Sure, it's not a super important comment, but ...?

Yeah, it scarcely matters either way. Commit 4f627f8 reduced this comment to
merely restating the code, so I removed it instead.

> > @@ -2204,8 +2202,8 @@ GetOldestSafeDecodingTransactionId(void)
> > * the result is somewhat indeterminate, but we don't really care. Even in
> > * a multiprocessor with delayed writes to shared memory, it should be certain
> > * that setting of delayChkpt will propagate to shared memory when the backend
> > - * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if
> > - * it's already inserted its commit record. Whether it takes a little while
> > + * takes a lock, so we cannot fail to see a virtual xact as delayChkpt if it's
> > + * already inserted its critical xlog record. Whether it takes a little while
> > * for clearing of delayChkpt to propagate is unimportant for correctness.
> > */
>
> Seems unrelated, given that this is already used in
> MarkBufferDirtyHint(). Don't get me wrong, I think the changes are a
> good idea, but it's not really tied to the truncation changes.

Quite so; its branch (one branch = one proposed PostgreSQL commit),
mxt2-cosmetic, contains no truncation changes. Likewise for the other
independent comment improvements you noted.

In response to

Re: Rework the way multixact truncations work at 2015-12-02 15:46:26 from Andres Freund

Responses

Re: Rework the way multixact truncations work at 2015-12-03 18:03:21 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2015-12-03 10:05:19	Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message	amul sul	2015-12-03 08:52:43	Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()