Re: Upgrading from Postgresql 9.3.8 to 9.6.10

From: Yuri Niyazov <yuri(at)academia(dot)edu>
To: Keith <keith(at)keithf4(dot)com>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Upgrading from Postgresql 9.3.8 to 9.6.10
Date: 2019-04-25 20:43:25
Message-ID: CACuBw0hxrOejqDNqhOhyNO9ps0_ROKKJYhG1w6weDNn++PkzBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, Mar 25, 2019 at 11:44 PM Keith <keith(at)keithf4(dot)com> wrote:

> On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <yuri(at)academia(dot)edu> wrote:
>
>> We have an old cluster, running 9.3.8, that we are trying to upgrade to
>> 9.6.10.
>>
>> After running pg_upgrade and starting the server, and testing some common
>> queries against it, we experienced the following error:
>>
>> PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero
>> page at block 17021871 HINT: Please REINDEX it.
>>
>> So, if I am reading this correctly, there is an index that 9.3.8 created,
>> and it recognizes that index as OK, but 9.6.10 thinks that index is
>> corrupt. Since this happened on one index, it seems reasonable to assume
>> that this could happen on any index in that database, and we need to
>> reindex the entire database.
>>
>> Now, this is a live application, and we would like to minimize continuous
>> downtime (multiple short downtimes are fine), so what we are considering
>> doing is a manual reindex: for each index, create a new one, and then drop
>> the old one, and after all that, upgrade.
>>
>> However, we are leery of doing this reindexing using 9.3.8, since it's
>> already demonstrated itself to be unreliable.
>>
>> Which version should we use to reindex? Just the latest 9.3.X? Was there
>> a known bug with older versions missing corrupted pages in indices?
>>
>> PS: Just in case I am glaringly doing something wrong:
>>
>> Here's our mechanism of doing the upgrade: we create a streaming replica
>> from a basebackup, and then at some point turn off writes to the primary,
>> convert the replica to another primary, and then run pg_upgrade on this new
>> primary.
>>
>
>
> Have you done this more than once? If so, is it the same indexes every
> time you're having issues with?
>
> Answering these questions will help you narrow down if the corruption is
> in your original database or something to do with the upgrade.
>
> However, I would also highly recommend getting to the latest version of
> 9.3 and rebuilding your replicas afterwards. The latest patch release was
> 9.3.25, so your current version is quite far behind. There were quite a
> number of data corruption issues related to replication in the 9.3 series.
> More-so in the earlier patch versions, but I highly recommend also trying
> this before your major upgrade.
>
> Keith
>

We tried it a second time, and were unable to reproduce the issue, so it
seems like the issue is non-deterministic, which is a little scary. In any
case, we then proceeded to upgrade to 9.3.25 as you recommended, and then
upgraded to the 9.6 series from it, and the issue didn't reproduce; we've
been running on the new upgraded version for thel last two weeks and it
looks ok so far.

Thanks for your help!

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Keith 2019-04-25 22:17:05 Re: Upgrading from Postgresql 9.3.8 to 9.6.10
Previous Message Alvaro Herrera 2019-04-25 19:23:19 Re: Too many serialization errors in production