Re: BUG #13373: Imminent transaction id wrap around in spite of daily VACUUM

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: ms(at)clickware(dot)de
Cc: Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13373: Imminent transaction id wrap around in spite of daily VACUUM
Date: 2015-06-09 00:11:52
Message-ID: CA+TgmoZdOQatWcg3SsuNdGPHZijk1CcdOa96DdvJP6273WddpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, May 28, 2015 at 6:52 PM, <ms(at)clickware(dot)de> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 13373
> Logged by: Marc Schablewski
> Email address: ms(at)clickware(dot)de
> PostgreSQL version: 9.3.4
> Operating system: SuSE Enterprise Linux 11.3
> Description:
>
> Today one of our servers started complaining about getting close to the
> transaction limit and that our database "mydb" should be VACUUMed to prevent
> shutdown and transaction ID wrap around, although the database is VACUUMed
> every night and autovacuum is enabled.
>
> 2015-05-28 01:42:04 CEST ::1(44588) 40015 WARNING: database "mydb" must be
> vacuumed within 11000000 transactions
> 2015-05-28 01:42:04 CEST ::1(44588) 40015 HINT: To avoid a database
> shutdown, execute a database-wide VACUUM in that database.
> You might also need to commit or roll back old prepared
> transactions.
>
> Sadly this condition remained undetected until the automatic shutdown.
>
> Since the docs are a bit unclear on what kind of VACUUM one should run in
> this case, I started the database in single user mode and executed a regular
> VACUUM and a VACUUM FREEZE. Both commands ran for a while (~45 min) before
> returning and printing the same warning message about VACUUMing the
> database. Each time the remaining transaction count was reduced by one.
>
> As stated before, the database is VACUUMed once every night and autovacuum
> is enabled. We also do not use prepared transaction. So I wonder how the
> database got into this state.
>
> While analyzing the problem I noticed some other strange things about the
> database that might or might not be connected to our problem. There are
> still a lot (~100) of temporary tables left, even after closing all
> connections / shutting down the database.
>
> When I use this statement, taken from chapter 23.1.4. of the docs:
>
> SELECT c.oid::regclass as table_name,
> greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
> FROM pg_class c
> LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
> WHERE c.relkind = 'r';
>
> all tables have an "age" of "3", except for most of the temp tables. Their
> "age" ranges from a couple of hundreds to millions:
>
> ----
> 1: table_name = "pg_temp_3.temp_table0" (typeid = 2205, len = 4,
> typmod = -1, byval = t)
> 2: age = "2146483651" (typeid = 23, len = 4, typmod = -1, byval =
> t)
> ----
> 1: table_name = "pg_temp_48.temp_table1" (typeid = 2205, len = 4,
> typmod = -1, byval = t)
> 2: age = "901" (typeid = 23, len = 4, typmod = -1, byval = t)
> ----
>
> The highest "age" is also the "age" of the database "mydb" (statement also
> taken from chapter 23.1.4.):
>
> SELECT datname, age(datfrozenxid) FROM pg_database;
>
> ----
> 1: datname = "template1" (typeid = 19, len = 64, typmod = -1,
> byval = f)
> 2: age = "50000001" (typeid = 23, len = 4, typmod = -1, byval =
> t)
> ----
> 1: datname = "template0" (typeid = 19, len = 64, typmod = -1,
> byval = f)
> 2: age = "2068988610" (typeid = 23, len = 4, typmod = -1, byval =
> t)
> ----
> 1: datname = "postgres" (typeid = 19, len = 64, typmod = -1,
> byval = f)
> 2: age = "111784253" (typeid = 23, len = 4, typmod = -1, byval =
> t)
> ----
> 1: datname = "mydb" (typeid = 19, len = 64, typmod = -1, byval
> = f)
> 2: age = "2146483651" (typeid = 23, len = 4, typmod = -1, byval =
> t)
> ----
>
> I have to admit that I am not sure how to interpret these "age" values.
>
> Secondly, there are a lot of files in pg_clog. The oldest ones are going
> back to October 2014 and they seem to have been created continuously since
> then. The whole directory is about 512MB in size. The docs state that it
> should only be about 1/10th of that size with the default setting of
> autovacuum_freeze_max_age.
>
> All parameters relating to vacuum or autovacuum are unchanged in our
> postgresql.conf, i.e we are using their default values.
>
> The next thing I'll try is to drop all those temporary schemas and tables
> and doing another VACUUM afterwards. I suspect one of them might cause the
> issue, but I would be thankful for other ideas.

Have you checked for prepared transactions?

Dropping old temporary schemas (pg_temp_NNN) might be something to
try, although in theory autovacuum should have done that for you
automatically.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Norbert Kiam Maclang 2015-06-09 02:55:41 postgresql core dump issue
Previous Message Tom Lane 2015-06-08 15:08:34 Re: BUG #13413: pg_stat_statements don't statistics "DEALLOCATE ALL" statements