Re: Reviewing freeze map code

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-21 03:29:13
Message-ID: CAA4eK1Kop7jfQFUfeuSdDgZAKEm-z51Ya5zHuJXEXG7PGG6Taw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 21, 2016 at 1:03 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2016-06-15 08:56:52 -0400, Robert Haas wrote:
> > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> > and CTID without logging anything or clearing the all-visible flag and
> > then releases the lock on the heap page to go do some more work that
> > might even ERROR out. Only if that other work all goes OK do we
> > relock the page and perform the WAL-logged actions.
> >
> > That doesn't seem like a good idea even in existing releases, because
> > you've taken a tuple on an all-visible page and made it not
> > all-visible, and you've made a page modification that is not
> > necessarily atomic without logging it.
>
> Right, that's broken.
>
>
> > I'm not sure what to do about this: this part of the heap_update()
> > logic has been like this forever, and I assume that if it were easy to
> > refactor this away, somebody would have done it by now.
>
> Well, I think generally nobody seriously looked at actually refactoring
> heap_update(), even though that'd be a good idea. But in this instance,
> the problem seems relatively fundamental:
>
> We need to lock the origin page, to do visibility checks, etc. Then we
> need to figure out the target page. Even disregarding toasting - which
> we could be doing earlier with some refactoring - we're going to have to
> release the page level lock, to lock them in ascending order. Otherwise
> we'll risk kinda likely deadlocks.
>

Can we consider to use some strategy to avoid deadlocks without releasing
the lock on old page? Consider if we could have a mechanism such that
RelationGetBufferForTuple() will ensure that it always returns a new buffer
which has targetblock greater than the old block (on which we already held
a lock). I think here tricky part is whether we can get anything like that
from FSM. Also, there could be cases where we need to extend the heap when
there were pages in heap with space available, but we have ignored them
because there block number is smaller than the block number on which we
have lock.

> We also certainly don't want to nest
> the lwlocks for the toast stuff, inside a content lock for the old
> tupe's page.
>
> So far the best idea I have - and it's really not a good one - is to
> invent a new hint-bit that tells concurrent updates to acquire a
> heavyweight tuple lock, while releasing the page-level lock. If that
> hint bit does not require any other modifications - and we don't need an
> xid in xmax for this use case - that'll avoid doing all the other
> `already_marked` stuff early, which should address the correctness
> issue.
>

Don't we need to clear such a flag in case of error? Also don't we need to
reset it later, like when modifying the old page later before WAL.

> It's kinda invasive though, and probably has performance
> implications.
>

Do you see performance implication due to requirement of heavywieht tuple
lock in more cases than now or something else?

Some others ways could be:

Before releasing the lock on buffer containing old tuple, clear the VM and
visibility info from page and WAL log it. I think this could impact
performance depending on how frequently we need to perform this action.

Have a new flag like HEAP_XMAX_UNLOGGED (as it was there when this logic
was introduced in commit f2bfe8a24c46133f81e188653a127f939eb33c4a ) and set
the same in old tuple header before releasing lock on buffer and teach
tqual.c to honor the flag. I think tqual.c should consider
HEAP_XMAX_UNLOGGED as an indication of aborted transaction unless it is
currently in-progress. Also, I think we need to clear this flag before WAL
logging in heap_update.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-06-21 03:32:01 Re: primary_conninfo missing from pg_stat_wal_receiver
Previous Message Tom Lane 2016-06-21 03:16:53 Re: parallel.c is not marked as test covered