From: | bricklen <bricklen(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Natalie Wenz <nataliewenz(at)ebureau(dot)com>, pgsql-admin <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: pg_upgrade and frozen xids |
Date: | 2018-03-07 20:21:23 |
Message-ID: | CAGrpgQ9apRxeCng82nd0qwD7bKtNPebT8XtTcC0NxddBgcUnNA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Wed, Mar 7, 2018 at 12:01 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I happen to know that bricklen already ran amcheck. There were errors,
> but they were not consistent with a collation issue. Rather, it looked
> like something was up with the storage layer -- the sibling links of a
> pair of pages were not in mutual agreement.
>
> Even if that wasn't something that I knew already, I still would not
> suspect opclass misbehavior of any variety. VACUUM doesn't care about
> the ordering of items on the page in the case of nbtree. And, it
> performs a physical order scan there (albeit with some extra trickery
> to prevent races due to concurrent splits). Index tuples that could
> end up being unreachable to index scans due to opclass misbehavior
> should remain reachable to VACUUM.
>
What little detail I've been able to collect so far is below. All for 10.1
clusters.
From the postgres logs, for 6 different databases (across 3 geo regions, of
which two were on the same hypervisor). Each one was discovered when
autovacuum tried to vacuum them:
ERROR: could not find left sibling of block 4775 in index "<some index>"
ERROR: right sibling 13983 of block 7196 is not next child 7246 of block
5208 in index "<some index>"
ERROR: right sibling 60252 of block 60115 is not next child 60118 of block
60113 in index "<some index>"
ERROR: right sibling 93058 of block 93057 is not next child 93061 of block
93008 in index "<some index>"
ERROR: right sibling 10081 of block 10079 is not next child 10084 of block
10046 in index "<some index>"
ERROR: left link changed unexpectedly in block 13868 of index "<some
index>"
ERROR: right sibling 145 of block 92 is not next child 93 of block 3 in
index "<some index>"
A strace from the hung autovac process (before we killed it):
futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
...
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Kirkwood | 2018-03-07 21:01:53 | Re: Reliable WAL file shipping over unreliable network |
Previous Message | Peter Geoghegan | 2018-03-07 20:11:15 | Re: pg_upgrade and frozen xids |