Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Date: 2016-04-01 00:10:31
Message-ID: 20160401001031.GA1522602@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> >> As a result of looked into code around the recvoery, ISTM that the
> >> cause is related to relation cache clear.
> >> In heap_xlog_visible, if the standby server receives WAL record then
> >> relation cache is eventually cleared in vm_extend, but If standby
> >> server receives FPI then relation cache would not be cleared.
> >> For example, after I applied attached patch to HEAD, (it might not be
> >> right way but) this problem seems to be resolved.
> >>
> >> Is this a bug? or not?
> >
> > It's a bug. I don't expect it causes queries to return wrong answers, because
> > visibilitymap.c says "it's always safe to clear a bit in the map from
> > correctness point of view." (The bug makes a visibility map bit temporarily
> > appear to have been cleared.) I still call it a bug, because recovery
> > behavior becomes too difficult to verify when xlog replay produces conditions
> > that don't happen outside of recovery. Even if there's no way to get a wrong
> > query answer today, this would be too easy to break later. I wonder if we
> > make the same omission in other xlog replay functions. Similar omissions may
> > cause wrong query answers, even if this particular one does not.
> >
> > Would you like to bisect for the commit, or at least the major release, at
> > which the bug first appeared?
> >
> > I wonder if your discovery has any relationship to this recently-reported case
> > of insufficient smgr invalidation:
> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA(at)mail(dot)gmail(dot)com
> >
>
> I'm not sure this bug has relationship to another issue you mentioned
> but after further investigation, this bug seems to be reproduced even
> on more older version.
> At least I reproduced it at 9.0.0.

Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and
had no correctness implications until 9.2. If 9.2 behaves this way, it's
almost certainly not a recent regression.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-01 00:29:55 Re: So, can we stop supporting Windows native now?
Previous Message Craig Ringer 2016-04-01 00:05:42 Re: So, can we stop supporting Windows native now?