From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
Date: | 2015-06-02 15:29:24 |
Message-ID: | CA+TgmoaAnNLS-cM7UhGMU=aaDw_LpCGdXKajP1MQApUvvpDG8A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> But what *definitely* looks wrong to me is that a TruncateMultiXact() in
> this scenario now (since a couple weeks ago) does a
> SimpleLruReadPage_ReadOnly() in the members slru via
> find_multixact_start(). That just won't work acceptably when we're not
> yet consistent. There very well could not be a valid members segment at
> that point? Am I missing something?
Yes: that code isn't new.
TruncateMultiXact() called SimpleLruReadPage_ReadOnly() directly in
9.3.0 and every subsequent release until 9.3.7/9.4.2. The only thing
that's changed is that we've moved that logic into a function called
find_multixact_start() instead of having it directly inside that
function. We did that because we needed to use the same logic in some
other places. The reason why 9.3.7/9.4.2 are causing problems for
people that they didn't have previously is because those new,
additional call sites were poorly chosen and didn't include adequate
protection against calling that function with an invalid input value.
What this patch is about is getting back to the situation that we were
in from 9.3.0 - 9.3.6 and 9.4.0 - 9.4.1, where TruncateMultiXact() did
the thing that you're complaining about here but no one else did.
From my point of view, I think that you are absolutely right to
question what's going on in TruncateMultiXact(). It's kooky, and
there may well be bugs buried there. But I don't think fixing that
should be the priority right now, because we have zero reports of
problems attributable to that logic. I think the priority should be
on undoing the damage that we did in 9.3.7/9.4.2, when we made other
places to do the same thing. We started getting trouble reports
attributable to those changes *almost immediately*, which means that
whether or not TruncateMultiXact() is broken, these new call sites
definitely are. I think we really need to fix those new places ASAP.
> I think at the very least we'll have to skip this step while not yet
> consistent. That really sucks, because we'll possibly end up with
> multixacts that are completely filled by the time we've reached
> consistency.
That would be a departure from the behavior of every existing release
that includes this code based on, to my knowledge, zero trouble
reports.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | William Dunn | 2015-06-02 15:35:58 | Re: Database designpattern - product feature |
Previous Message | Andres Freund | 2015-06-02 15:27:33 | Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2015-06-02 15:34:17 | Re: nested loop semijoin estimates |
Previous Message | Andres Freund | 2015-06-02 15:27:33 | Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |