From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
Cc: | "Wood, Dan" <hexpert(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: VM map freeze corruption |
Date: | 2018-04-18 13:36:57 |
Message-ID: | 20180418133657.gcbg7exanyg5sglw@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Pavan Deolasee wrote:
> On Wed, Apr 18, 2018 at 7:37 AM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> > My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId()
> > returns FRM_NOOP if the MultiXACT locked rows haven't committed. This
> > results in changed=false and totally_frozen=true(as initialized). When
> > this returns to lazy_scan_heap(), no rows are added to the frozen[] array.
> > Yet, tuple_totally_frozen is true. This means the page is marked frozen in
> > the VM, even though the MultiXACT row wasn't left untouch.
> >
> > A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
> > else
> > {
> > Assert(flags & FRM_NOOP);
> > + totally_frozen = false;
> > }
> >
>
> That's a great find!
Indeed.
This family of bugs (introduced by freeze map changes in 9.6) was
initially fixed in 38e9f90a227d but this spot was missed in that fix.
IMO the cause is the totally_frozen variable, which starts life in a
bogus state (true) and the different code paths can set it to the right
state, or by inaction end up deciding that the initial bogus state was
correct in the first place. Errors of omission are far too easy in that
kind of model, ISTM, so I propose this slightly different patch, which
albeit yet untested seems easier to verify and easier to get right.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
frozen.patch | text/plain | 2.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2018-04-18 13:40:07 | Re: ON CONFLICT DO UPDATE for partitioned tables |
Previous Message | Konstantin Knizhnik | 2018-04-18 13:36:46 | Re: Built-in connection pooling |