Re: Reviewing freeze map code

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-23 22:42:24
Message-ID: 20160623224224.ubztdvhfx6bwkphu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-06-21 16:32:03 -0400, Robert Haas wrote:
> On Tue, Jun 21, 2016 at 3:46 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2016-06-21 15:38:25 -0400, Robert Haas wrote:
> >> On Tue, Jun 21, 2016 at 1:49 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> >> I'm also a bit dubious that LockAcquire is safe to call in general
> >> >> with interrupts held.
> >> >
> >> > Looks like we could just acquire the tuple-lock *before* doing the
> >> > toast_insert_or_update/RelationGetBufferForTuple, but after releasing
> >> > the buffer lock. That'd allow us to do avoid doing the nested locking,
> >> > should make the recovery just a goto l2;, ...
> >>
> >> Why isn't that racey? Somebody else can grab the tuple lock after we
> >> release the buffer content lock and before we acquire the tuple lock.
> >
> > Sure, but by the time the tuple lock is released, they'd have updated
> > xmax. So once we acquired that we can just do
> > if (xmax_infomask_changed(oldtup.t_data->t_infomask,
> > infomask) ||
> > !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
> > xwait))
> > goto l2;
> > which is fine, because we've not yet done the toasting.
>
> I see.
>
> > I'm not sure wether this approach is better than deleting potentially
> > toasted data though. It's probably faster, but will likely touch more
> > places in the code, and eat up a infomask bit (infomask & HEAP_MOVED
> > == HEAP_MOVED in my prototype).
>
> Ugh. That's not very desirable at all.

I'm looking into three approaches right now:

1) Flag approach from above
2) Undo toasting on concurrent activity, retry
3) Use WAL logging for the already_marked = true case.

1) primarily suffers from a significant amount of complexity. I still
have a bug in there that sometimes triggers "attempted to update
invisible tuple" ERRORs. Otherwise it seems to perform decently
performancewise - even on workloads with many backends hitting the same
tuple, the retry-rate is low.

2) Seems to work too, but due to the amount of time the tuple is not
locked, the retry rate can be really high. As we perform significant
amount of work (toast insertion & index manipulation or extending a
file) , while the tuple is not locked, it's quite likely that another
session tries to modify the tuple inbetween. I think it's possible to
essentially livelock.

3) This approach so far seems the best. It's possible to reuse the
xl_heap_lock record (in an afaics backwards compatible manner), and in
most cases the overhead isn't that large. It's of course annoying to
emit more WAL, but it's not that big an overhead compared to extending a
file, or to toasting. It's also by far the simplest fix.

Comments?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-06-23 22:59:57 Re: Reviewing freeze map code
Previous Message Tatsuo Ishii 2016-06-23 22:27:24 Re: Questionabl description in datatype.sgml