Re: Help me recovering data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql(at)mohawksoft(dot)com, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Kouber Saparev <postgresql(at)saparev(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Help me recovering data
Date: 2005-02-16 18:01:56
Message-ID: 25684.1108576916@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> All in all, I figure that odds are very high that if someone isn't
> vacuuming in the rest of the transaction id space, either the transaction
> rate is high enough that 100,000 warning may not be enough or they aren't
> going to pay attention anyway and the howitzer might not be bad.

Yeah. It's easy to imagine scenarios where the majority of the warnings
go into the bit bucket (because they are going to noninteractive client
applications that just ignore NOTICE messages). So I think it's
appropriate to be delivering the warnings for a good long time, in hopes
that someone at least occasionally fires up psql and happens to actually
see them. Something like 100K or 1M transactions feels about right
to me.

Pulling the failure trigger with 100K transactions still to go is surely
overly conservative, but compared to the size of the ID space it is not
worth noticing.

As far as the actual implementation, I was envisioning adding a limiting
XID variable and a database name variable to shared memory (protected by
the same LWLock that protects the nextXID counter). These would
be computed and loaded during the bootstrap process, right after we
finish WAL replay if any. It would probably cost us one XID to do this
(though maybe it could be done without running a real transaction? This
ties in with my thoughts about replacing GetRawDatabaseInfo with a flat
file...), but one XID per postmaster start attempt is hopefully not
gonna kill us. Subsequently, any VACUUM that updates a datfrozenxid
entry in pg_database would update these variables to reflect the new
safe limit and the name of the database with the currently oldest
datfrozenxid. This would allow a very cheap comparison during
GetNewTransactionId to see if we are near enough to generate a warning:
WARNING: database "foo" must be vacuumed within 58372 transactions
or past the limit and generate an error:
ERROR: database is shut down to avoid wraparound data loss in database "foo"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "foo".
In the error case, we could error out *without* advancing nextXID,
so that even automated clients continually retrying failed transactions
couldn't blow past the safety margin.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruno Wolff III 2005-02-16 18:08:26 Re: Help me recovering data
Previous Message Andrew Dunstan 2005-02-16 18:00:22 Re: Help me recovering data