From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Rework the way multixact truncations work |
Date: | 2015-07-02 17:58:45 |
Message-ID: | CA+TgmoYjxUYQK5oDw3r432KTvORW8y17jW6kmuZ4csW+n49keQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 2, 2015 at 11:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Will look at 0003 next.
+ appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
I don't think we typically use this style for notating intervals.
case XLOG_MULTIXACT_CREATE_ID:
id = "CREATE_ID";
break;
+ case XLOG_MULTIXACT_TRUNCATE_ID:
+ id = "TRUNCATE";
+ break;
If XLOG_MULTIXACT_CREATE_ID -> "CREATE_ID", then why not
XLOG_MULTIXACT_TRUNCATE_ID -> "TRUNCATE_ID"?
+ * too old to general truncation records.
s/general/generate/
+ MultiXactId oldestMXactDB;
Data type should be OID.
+ * Recompute limits as we are now fully started, we now can correctly
+ * compute how far a members wraparound is away.
s/,/:/ or something. This isn't particularly good English as written.
+ * Computing the actual limits is only possible once the data directory is
+ * in a consistent state. There's no need to compute the limits while
+ * still replaying WAL as no new multis can be created anyway. So we'll
+ * only do further checks after TrimMultiXact() has been called.
Multis can be and are created during replay. What this should really
say is that we have no choice about whether to create them or not: we
just have to replay whatever's there.
+ (errmsg("performing legacy multixact truncation,
upgrade master")));
This message needs work. I'm not sure exactly what it should say, but
I'm pretty sure that's not clear enough.
I seriously, seriously doubt that it is a good idea to perform the
legacy truncation from MultiXactAdvanceOldest() rather than
TruncateMultiXact(). The checkpoint hasn't really happened at that
point yet; you might truncate away stuff, then crash before the
checkpoint is complete, and then we you restart recovery, you've got
trouble. I think you should be very, very cautious about rejiggering
the order of operations here. The current situation is not good, but
casually rejiggering it can make things much worse.
- * If no multixacts exist, then oldestMultiXactId will be the next
- * multixact that will be created, rather than an existing multixact.
+ * Determine the offset of the oldest multixact. Normally, we can read
+ * the offset from the multixact itself, but there's an important special
+ * case: if there are no multixacts in existence at all, oldestMXact
+ * obviously can't point to one. It will instead point to the multixact
+ * ID that will be assigned the next time one is needed.
There's no need to change this; it means the same thing either way.
Generally, I think you've weakened the logic in SetOffsetVacuumLimit()
appreciably here. The existing code is careful never to set
oldestOffsetKnown false when it was previously true. Your rewrite
removes that property. Generally, I see no need for this function to
be overhauled to the extent that you have, and would suggest reverting
the changes that aren't absolutely required.
I don't particularly like the fact that find_multixact_start() calls
SimpleLruFlush(). I think that's really a POLA violation: you don't
expect that a function that looks like a simple inquiry is going to go
do a bunch of unrelated I/O in the background. If somebody called
find_multixact_start() with any frequency, you'd be sad. You're just
doing it this way because of the context *in which you expect
find_multixact_start* to be called, which does not seem very
future-proof. I prefer Thomas's approach.
If TruncateMultiXact() fails to acquire MultiXactTruncationLock right
away, does it need to wait, or could it ConditionalAcquire and bail
out if the lock isn't obtained?
+ * Make sure to only attempt truncation if there's values to truncate
+ * away. In normal processing values shouldn't go backwards, but there's
+ * some corner cases (due to bugs) where that's possible.
I think this comment should be more detailed. Is that talking about
the same thing as this comment:
- * Due to bugs in early releases of PostgreSQL 9.3.X and 9.4.X,
- * oldestMXact might point to a multixact that does not exist.
- * Autovacuum will eventually advance it to a value that does exist,
- * and we want to set a proper offsetStopLimit when that happens,
- * so call DetermineSafeOldestOffset here even if we're not actually
- * truncating.
This comment seems to be saying there's a race condition:
+ * XXX: It's also possible that the page that oldestMXact is on has
+ * already been truncated away, and we crashed before updating
+ * oldestMXact.
But why is that an XXX? I think this is just a case of recovery
needing tolerate redo of an action already redone.
I'm not convinced that it's a good idea to remove
lastCheckpointedOldest and replace it with nothing. It seems like a
very good idea to have two separate pieces of state in shared memory:
- The oldest offset that we think anyone might need to access to make
a visibility check for a tuple.
- The oldest offset that we still have on disk.
The latter would now need to be called something else, not
lastCheckpointedOldest, but I think it's still good to have it.
Otherwise, I don't see how you protect against the on-disk state
wrapping around before you finish truncating, and then maybe
truncation eats something that was busy getting reused. We might be
kind of hosed in that situation anyway, because TruncateMultiXact()
and some other places assume that circular comparisons will return
sensible values. But that could be fixed, and probably should be
fixed eventually.
+ (errmsg("supposedly still alive MultiXact %u not
found, skipping truncation",
Maybe "cannot truncate MultiXact %u because it does not exist on disk,
skipping truncation"?
I think "frozenMulti" is a slightly confusing variable name and
deserves a comment. AUIU, that's the oldest multiXact we need to
keep. So it's actually the oldest multi that is NOT guaranteed to be
frozen. minMulti might be a better variable name, but a comment is
justified either way.
+ * Update in-memory limits before performing the truncation, while inside
+ * the critical section: Have to do it before truncation, to prevent
+ * concurrent lookups of those values. Has to be inside the critical
+ * section asotherwise a future call to this function would error out,
+ * while looking up the oldest member in offsets, if our caller crashes
+ * before updating the limits.
Missing space: asotherwise.
Who else might be concurrently looking up those values? Nobody else
can truncate while we're truncating, because we hold
MultiXactTruncationLock. And nobody else should be getting here from
looking up tuples, because if they are, we truncated too soon.
- * Startup MultiXact. We need to do this early for two reasons: one is
- * that we might try to access multixacts when we do tuple freezing, and
- * the other is we need its state initialized because we attempt
- * truncation during restartpoints.
+ * Startup MultiXact, we need to do this early, to be able to replay
+ * truncations.
The period after "Startup MultiXact" was more correct than the comma
you've replaced it with.
Phew. That's all I see on a first read-through, but I've only spent a
couple of hours on this, so I might easily have missed some things.
But let me stop here, hit send, and see what you think of these
comments.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2015-07-02 18:10:27 | Re: Support for N synchronous standby servers - take 2 |
Previous Message | Peter Geoghegan | 2015-07-02 17:49:03 | Re: Refactoring speculative insertion with unique indexes a little |