Re: BUG #11264: Auto vacuum wraparound job blocking everything

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: dbhandary(at)switchfly(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date: 2014-08-27 15:26:36
Message-ID: 20140827152636.GC7046@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

dbhandary(at)switchfly(dot)com wrote:

> We recently upgraded to postgressql 9.3.5 from postgres 9.1 using
> pg_upgrade. We are running streaming replication. DB was functioning without
> any issues till today where we discovered an auto vacuum wraparound job on
> one of our tables which was blocking everything. There were no waiting jobs
> when we queried pg_stat_activity, but incoming request would just hang, so
> database was unusable.

Can you please paste

select oid::regclass, relfrozenxid, relminmxid from pg_class where relname = 'room_contract_service_code';
select datname, datfrozenxid, datminmxid from pg_database where datname = 'jetblue';

> (gdb) bt
> #0 0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1 0x000000000075ad7a in pg_usleep ()
> #2 0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3 0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4 0x000000000057d036 in lazy_vacuum_rel ()

This stall in GetMultiXactIdMembers should only occur when somebody is
writing the multixact immediately following. It certainly should not
sleep for long. I'm not sure what's happening here but this is probably
where the problem is. While waiting, it holds the buffer content lock
in exclusive mode.

Can you install debug symbols and pageinspect? I'm curious why this is
stalling. If the tuple has a FOR SHARE or FOR UPDATE marking from
before the upgrade, it shouldn't be inquiring the multixact members at
all. I'm curious about

Latest checkpoint's NextMultiXactId: 23431
Latest checkpoint's NextMultiOffset: 5678
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16423

It says the oldest multi is 1, so the database should not have any
values between 1 and 23431 that correspond to pg_upgrade'd multixact
values ... so what is going on here? Unless the recent mucking with
pg_upgrade to handle multixact's got something wrong.

> 10.33.11.11 | | 40735 | 2014-08-25 10:38:02.944083+00
> | 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25
> 10:38:22.805861+00 | f | active | COPY
> settings.room_contract_service_code (room_contract_service_code,
> room_contract_service_description
> , room_contract_service_code_id, supplierid, inactive) TO stdout;
>
> (gdb) bt
> #0 0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1 0x0000000000609397 in PGSemaphoreLock ()
> #2 0x000000000064e871 in LWLockAcquire ()
> #3 0x000000000047e1a9 in ?? ()
> #4 0x000000000047e66a in ?? ()
> #5 0x000000000047f0a6 in heap_getnext ()
> #6 0x00000000005343e6 in ?? ()
> #7 0x000000000053521a in DoCopy ()
> #8 0x000000000066112d in standard_ProcessUtility ()
> #9 0x00007fb96c4df261 in ?? () from /usr/pgsql-9.3/lib/pg_stat_statements.so

This one is stalling on the buffer content lock trying to read the
table to dump it.

> SELECTS:
>
> query |
> | SELECT room_contract_service_code,
> room_contract_service_description, room_contract_service_code_id
> | FROM room_contract_service_code
> | WHERE room_contract_service_code.supplierid = $1
> | AND NOT inactive
> | ORDER by room_contract_service_description
> (gdb) bt
> #0 0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1 0x0000000000609397 in PGSemaphoreLock ()
> #2 0x000000000064e871 in LWLockAcquire ()
> #3 0x000000000048d05d in index_fetch_heap ()
> #4 0x000000000048d22e in index_getnext ()

And this one is stalling in the same buffer lock most likely.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-27 15:39:23 Re: BUG #11264: Auto vacuum wraparound job blocking everything
Previous Message Tom Lane 2014-08-27 15:24:53 Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns