From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | 9.3: more problems with "Could not open file "pg_multixact/members/xxxx" |
Date: | 2014-07-15 22:58:35 |
Message-ID: | CAMkU=1wX9eUumStJODnigW6kB==aNJv5jCUwybzRMNi=Qajs1w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:
> Jeff Janes wrote:
>
> > This problem was initially fairly easy to reproduce, but since I
> > started adding instrumentation specifically to catch it, it has become
> > devilishly hard to reproduce.
> >
> > I think my next step will be to also log each of the values which goes
> > into the complex if (...) expression that decides on the deletion.
>
> Could you please to reproduce it after updating to latest? I pushed
> fixes that should close these issues. Maybe you want to remove the
> instrumentation you added, to make failures more likely.
>
There are still some problems in 9.4, but I haven't been able to diagnose
them and wanted to do more research on it. The announcement of upcoming
back-branches for 9.3 spurred me to try it there, and I have problems with
9.3 (12c5bbdcbaa292b2a4b09d298786) as well. The move of truncation to the
checkpoint seems to have made the problem easier to reproduce. On an 8
core machine, this test fell over after about 20 minutes, which is much
faster than it usually reproduces.
This the error I get:
2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR: could not access status of
transaction 85837221
2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL: Could not open file
"pg_multixact/members/14031": No such file or directory.
2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT: SQL statement "SELECT 1
FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR
KEY SHARE OF x"
The testing harness is attached as 3 patches that must be made to the test
server, and 2 scripts. The script do.sh sets up the database (using fixed
paths, so be careful) and then invokes count.pl in a loop to do the actual
work.
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
0002-pg_burn_multixact-utility.patch | application/octet-stream | 7.0 KB |
count.pl | application/octet-stream | 9.7 KB |
crash_REL9_4_BETA1.patch | application/octet-stream | 12.6 KB |
do.sh | application/x-sh | 3.6 KB |
member_delete_log.patch | application/octet-stream | 999 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-07-16 01:17:50 | Re: Allowing join removals for more join types |
Previous Message | Robert Haas | 2014-07-15 22:41:41 | Re: returning SETOF RECORD |