Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Date: 2015-12-10 22:13:12
Message-ID: CA+TgmoYH26szcTjQS5zKN5D8AMGPyQsD2kPFZcrPUJgy70DbkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2015 at 4:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> I recently started a pgbench benchmark (to evaluate a piece of hardware,
> not postgres) with master. Unfortunately, by accident, I started
> postgres in a shell, not screen like pgbench.
>
> Just logged back in and saw:
> client 71 aborted in state 8: ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 97
> number of threads: 97
> duration: 300000 s
> number of transactions actually processed: 2566862424
> latency average: 3.214 ms
> latency stddev: 7.336 ms
> tps = 30169.374133 (including connections establishing)
> tps = 30169.378406 (excluding connections establishing)
>
> Hm. Bad news. We apparently didn't keep up vacuuming. But worse news is
> that even now, days later, autovacuum hasn't progressed:
> postgres=# select txid_current();
> ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>
> Looking at datfrozenxid:
> postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ;
> datname | datfrozenxid | age
> -----------+--------------+-----------
> template1 | 3357685367 | 0
> template0 | 3357685367 | 0
> postgres | 3159867733 | 197817634
> (3 rows)
> reveals that the launcher doesn't do squat because it doesn't think it
> needs to do anything.
>
> (gdb) p *ShmemVariableCache
> $3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715, xidWarnLimit = 3347685362,
> xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs = 0, newestCommitTs = 0,
> latestCompletedXid = 3357685366}
>
> 'oldestXid' shows the problem: We're indeed pretty short before a
> wraparound.

Is the postmaster in a "stopped" state (T)?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-10 22:15:40 Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Previous Message Robert Haas 2015-12-10 22:11:43 Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)