From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
Date: | 2015-06-01 21:30:14 |
Message-ID: | 20150601213014.GA2984@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Alvaro Herrera wrote:
> Anyway here's a quick script to almost-reproduce the problem.
Meh. Really attached now.
I also wanted to post the error messages we got:
2015-05-27 16:15:17 UTC [4782]: [3-1] user=,db= LOG: entering standby mode
2015-05-27 16:15:18 UTC [4782]: [4-1] user=,db= LOG: restored log file "00000001000073DD000000AD" from archive
2015-05-27 16:15:18 UTC [4782]: [5-1] user=,db= FATAL: could not access status of transaction 4624559
2015-05-27 16:15:18 UTC [4782]: [6-1] user=,db= DETAIL: Could not read from file "pg_multixact/offsets/0046" at offset 147456: Success.
2015-05-27 16:15:18 UTC [4778]: [4-1] user=,db= LOG: startup process (PID 4782) exited with exit code 1
2015-05-27 16:15:18 UTC [4778]: [5-1] user=,db= LOG: aborting startup due to startup process failure
We pg_xlogdumped the offending segment and see that there is a
checkpoint record with oldestMulti=4624559. Curiously, there are no
other records with rmgr=MultiXact in that segment.
Note that the file exists (otherwise the error would say "could not
open" rather than "could not read"); this is also why errno says
"Success" rather than some actual error; this is the code:
errno = 0;
if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
{
slru_errcause = SLRU_READ_FAILED;
slru_errno = errno;
CloseTransientFile(fd);
return false;
}
My guess is that the file existed, and perhaps had one or more pages,
but the wanted page doesn't exist, so we tried to read but got 0 bytes
back. read() returns 0 in this case but doesn't set errno.
I didn't find a way to set things so that the file exists but is of
shorter contents than oldestMulti by the time the checkpoint record is
replayed.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
repro-chkpt-replay-failure.sh | application/x-sh | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | William Dunn | 2015-06-01 22:25:10 | Re: odbc to emulate mysql for end programs |
Previous Message | Alvaro Herrera | 2015-06-01 21:19:41 | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2015-06-01 21:47:47 | Re: nested loop semijoin estimates |
Previous Message | Alvaro Herrera | 2015-06-01 21:19:41 | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |