From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pgsql: Fix a couple of bugs in MultiXactId freezing |
Date: | 2013-12-11 23:12:33 |
Message-ID: | 20131211231233.GC536@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > I don't so much have a problem with exporting CreateMultiXactId(), just
> > with exporting it under its current name. It's already quirky to have
> > both MultiXactIdCreate and CreateMultiXactId() in multixact.c but
> > exporting it imo goes to far.
>
> MultiXactidCreateFromMembers(int, MultiXactMembers *) ?
Works for me.
> > > Andres mentioned the idea of sharing some code between
> > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > explored that.
> >
> > My idea would just be to have heap_tuple_needs_freeze() call
> > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > that's slightly more expensive than the current
> > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > cleanup lock on a page, so that seems ok.
>
> Doesn't seem a completely bad idea, but let's leave it for a separate
> patch. This should be changed in master only IMV anyway, while the rest
> of this patch is to be backpatched to 9.3.
I am not so sure it shouldn't be backpatched together with this. We now
have similar complex logic in both functions.
> > > ! if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > ! !TransactionIdDidAbort(members[i].xid))#
> >
> > It makes me wary to see a DidAbort() without a previous InProgress()
> > call. Also, after we crashed, doesn't DidAbort() possibly return
> > false for transactions that were in progress before we crashed? At
> > least that's how I always understood it, and how tqual.c is written.
>
> Yes, that's correct. But note that here we're not doing a tuple
> liveliness test, which is what tqual.c is doing. What we do with this
> info is to keep the Xid as part of the multi if it's still running or
> committed. We also keep it if the xact crashed, which is fine because
> the Xid will be removed by some later step. If we know for certain that
> the update Xid is aborted, then we can ignore it, but this is just an
> optimization and not needed for correctness.
But why deviate that way? It doesn't seem to save us much?
> One interesting bit is that we might end up creating singleton
> MultiXactIds when freezing, if there's no updater and there's a running
> locker. We could avoid this (i.e. mark the tuple as locked by a single
> Xid) but it would complicate FreezeMultiXactId's API and it's unlikely
> to occur with any frequency anyway.
Yea, that seems completely fine.
> > I don't think there's a need to check for
> > TransactionIdIsCurrentTransactionId() - vacuum can explicitly *not* be
> > run inside a transaction.
>
> Keep in mind that freezing can also happen for tuples handled during a
> table-rewrite operation such as CLUSTER.
Good point.
> > > if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > ! frz->frzflags |= XLH_FREEZE_XVAC;
> > > else
> > > ! frz->frzflags |= XLH_INVALID_XVAC;
> > >
> >
> > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > the moved in/off, since the code has been gone longer than I've followed
> > the code...
>
> Yep, fixed.
I wonder how many of the HEAP_MOVED_* cases around are actually
correct... What was the last version those were generated? 8.4?
> (I was toying with the "desc"
> code because it misbehaves when applied on records as they are created,
> as opposed to being applied on records as they are replayed. I'm pretty
> sure everyone already knows about this, and it's the reason why
> everybody has skimped from examining arrays of things stored in followup
> data records. I was naive enough to write code that tries to decode the
> followup record that contains the members of the multiaxct we're
> creating, which works fine during replay but gets them completely wrong
> during regular operation. This is the third time I'm surprised by this
> misbehavior; blame my bad memory for not remembering that it's not
> supposed to work in the first place.)
I am not really sure what you are talking about. That you cannot
properly decode records before they have been processed by XLogInsert()?
If so, yes, that's pretty clear and I am pretty sure it will break in
lots of places if you try?
> Right now there is one case in this code that returns
> FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> work to keep the Multi as is and return FRM_NOOP instead; and it also
> returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> instead. Neither does any great damage, but there is a consideration
> that future examiners of the tuple would have to resolve the MultiXact
> by themselves (==> performance hit). On the other hand, returning
> INVALIDATE causes the block to be dirtied, which is undesirable if not
> already dirty.
Otherwise it will be marked dirty the next time reads the page, so I
don't think this is problematic.
> ! {
> ! if (ISUPDATE_from_mxstatus(members[i].status))
> ! {
> ! /*
> ! * It's an update; should we keep it? If the transaction is known
> ! * aborted then it's okay to ignore it, otherwise not. (Note this
> ! * is just an optimization and not needed for correctness, so it's
> ! * okay to get this test wrong; for example, in case an updater is
> ! * crashed, or a running transaction in the process of aborting.)
> ! */
> ! if (!TransactionIdDidAbort(members[i].xid))
> ! {
> ! newmembers[nnewmembers++] = members[i];
> ! Assert(!TransactionIdIsValid(update_xid));
> !
> ! /*
> ! * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> ! * the Xid in cache. Again, this is just an optimization, so
> ! * it's not a problem if the transaction is still running and
> ! * in the process of committing.
> ! */
> ! if (TransactionIdDidCommit(update_xid))
> ! update_committed = true;
> !
> ! update_xid = newmembers[i].xid;
> ! }
I don't think the conclusions here are correct - we might be setting
HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
transaction is in progress, there's the situation that we have updated
the clog, but have not yet removed ourselves from the procarray. I.e. a
situation in which TransactionIdIsInProgress() and
TransactionIdDidCommit() both return true. Afaik it is only correct to
set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.
> ! /*
> ! * Checking for very old update Xids is critical: if the update
> ! * member of the multi is older than cutoff_xid, we must remove
> ! * it, because otherwise a later liveliness check could attempt
> ! * pg_clog access for a page that was truncated away by the
> ! * current vacuum. Note that if the update had committed, we
> ! * wouldn't be freezing this tuple because it would have gotten
> ! * removed (HEAPTUPLE_DEAD) by HeapTupleSatisfiesVacuum; so it
> ! * either aborted or crashed. Therefore, ignore this update_xid.
> ! */
> ! if (TransactionIdPrecedes(update_xid, cutoff_xid))
> ! {
> ! update_xid = InvalidTransactionId;
> ! update_committed = false;
I vote for an Assert(!TransactionIdDidCommit(update_xid)) here.
> ! else
> ! {
> ! /*
> ! * Create a new multixact with the surviving members of the previous
> ! * one, to set as new Xmax in the tuple.
> ! *
> ! * If this is the first possibly-multixact-able operation in the
> ! * current transaction, set my per-backend OldestMemberMXactId
> ! * setting. We can be certain that the transaction will never become a
> ! * member of any older MultiXactIds than that.
> ! */
> ! MultiXactIdSetOldestMember();
> ! xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
> ! *flags |= FRM_RETURN_IS_MULTI;
> ! }
I worry that this MultiXactIdSetOldestMember() will be problematic in
longrunning vacuums like a anti-wraparound vacuum of a huge
table. There's no real need to set MultiXactIdSetOldestMember() here,
since we will not become the member of a multi. So I think you should
either move the Assert() in MultiXactIdCreateFromMembers() to it's other
callers, or add a parameter to skip it.
> ! /*
> ! * heap_prepare_freeze_tuple
> *
> * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
> ! * are older than the specified cutoff XID and cutoff MultiXactId. If so,
> ! * setup enough state (in the *frz output argument) to later execute and
> ! * WAL-log what we would need to do, and return TRUE. Return FALSE if nothing
> ! * is to be changed.
> ! *
> ! * Caller is responsible for setting the offset field, if appropriate. This
> ! * is only necessary if the freeze is to be WAL-logged.
I'd leave of that second sentence, if you want to freeze a whole page
but not WAL log it, you'd need to set offset as well...
> if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
> {
> ! else if (flags & FRM_RETURN_IS_MULTI)
> {
> ! frz->t_infomask &= ~HEAP_XMAX_BITS;
> ! frz->xmax = newxmax;
> !
> ! GetMultiXactIdHintBits(newxmax,
> ! &frz->t_infomask,
> ! &frz->t_infomask2);
> ! changed = true;
> }
I worry that all these multixact accesses will create huge performance
problems due to the inefficiency of the multixactid cache. If you scan a
huge table there very well might be millions of different multis we
touch and afaics most of them will end up in the multixactid cache. That
can't end well.
I think we need to either regularly delete that cache when it goes past,
say, 100 entries, or just bypass it entirely.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2013-12-12 01:08:41 | Re: pgsql: Fix a couple of bugs in MultiXactId freezing |
Previous Message | Tom Lane | 2013-12-11 22:23:03 | pgsql: Add a regression test case for plpython function returning setof |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-12-11 23:13:23 | Re: Reference to parent query from ANY sublink |
Previous Message | Peter Geoghegan | 2013-12-11 23:00:58 | Re: Why standby.max_connections must be higher than primary.max_connections? |