From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Daniel Farina <daniel(at)heroku(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: txid failed epoch increment, again, aka 6291 |
Date: | 2012-09-06 10:04:06 |
Message-ID: | 20120906100406.GA2399@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 04, 2012 at 09:46:58AM -0700, Daniel Farina wrote:
> I might try to find the segments leading up to the overflow point and
> try xlogdumping them to see what we can see.
That would be helpful to see.
Just to grasp at yet-flimsier straws, could you post (URL preferred, else
private mail) the output of "objdump -dS" on your "postgres" executable?
> If there's anything to note about the workload, I'd say that it does
> tend to make fairly pervasive use of long running transactions which
> can span probably more than one checkpoint, and the txid reporting
> functions, and a concurrency level of about 300 or so backends ... but
> per my reading of the mechanism so far, it doesn't seem like any of
> this should matter.
Thanks for the details; I agree none of that sounds suspicious.
After some further pondering and testing, this remains a mystery to me. These
symptoms imply a proper update of ControlFile->checkPointCopy.nextXid without
having properly updated ControlFile->checkPointCopy.nextXidEpoch. After
recovery, only CreateCheckPoint() updates ControlFile->checkPointCopy at all.
Its logic for doing so looks simple and correct.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2012-09-06 13:00:13 | Re: Draft release notes complete |
Previous Message | Amit kapila | 2012-09-06 09:08:15 | Re: [WIP PATCH] for Performance Improvement in Buffer Management |