From: | Dan Langille <info1(at)dvl-software(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: could not truncate directory "pg_subtrans": apparent wraparound |
Date: | 2015-06-06 17:13:59 |
Message-ID: | CAPG9OKf=8cXfjsu-2g=mk46yvwRXT11dkDn16GsSjVKODLqCnw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
If there's anything I can try on my servers to help diagnose the issues,
please let me know. If desired, I can arrange access for debugging.
On Sat, Jun 6, 2015 at 12:51 AM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
> wrote:
> On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
> > Thomas Munro wrote:
> >
> >> My idea was that if I could get oldestXact == next XID in
> >> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
> >> oldestXact that happens to be immediately after a page boundary (so
> >> that xid % 2048 == 0) might give page number that is >=
> >> latest_page_number, causing SimpleLruTruncate to print that message.
> >> But I can't figure out how to get next XID == oldest XID, because
> >> vacuumdb --freeze --all consumes xids itself, so in my first attempt
> >> at this, next XID is always 3 ahead of the oldest XID when a
> >> checkpoint is run.
> >
> > vacuumdb starts by querying pg_database, which eats one XID.
> >
> > Vacuum itself only uses one XID when vac_truncate_clog() is called.
> > This is called from vac_update_datfrozenxid(), which always happen at
> > the end of each user-invoked VACUUM (so three times for vacuumdb if you
> > have three databases); autovacuum does it also at the end of each run.
> > Maybe you can get autovacuum to quit before doing it.
> >
> > OTOH, if the values in the pg_database entry do not change,
> > vac_truncate_clog is not called, and thus vacuum would finish without
> > consuming an XID.
>
> I have manage to reproduce it a few times but haven't quite found the
> right synchronisation hacks to make it reliable so I'm not posting a
> repro script yet.
>
> I think it's a scary sounding message but very rare and entirely
> harmless (unless you really have wrapped around...). The fix is
> probably something like: if oldest XID == next XID, then just don't
> call SimpleLruTruncate (truncation is deferred until the next
> checkpoint), or perhaps (if we can confirm this doesn't cause problems
> for dirty pages or that there can't be any dirty pages before cutoff
> page because of the preceding flush (as I suspect)) we could use
> cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or
> maybe even always.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2015-06-06 19:58:05 | Re: Restore-reliability mode |
Previous Message | Tom Lane | 2015-06-06 17:05:50 | Initializing initFileRelationIds list via write is unsafe |