Re: "PANIC: could not open critical system index 2662" - twice

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Evgeny Morozov <postgresql3(at)realityexists(dot)net>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: "PANIC: could not open critical system index 2662" - twice
Date: 2023-05-07 10:30:52
Message-ID: CA+hUKGJ5zz8a6-D5ZQcoftZbAeBDs_h_3NL87TndKhW_2_aw1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, May 7, 2023 at 1:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Did you previously run this same workload on versions < 15 and never
> > see any problem? 15 gained a new feature CREATE DATABASE ...
> > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a
> > bug somewhere near that, though I have no specific idea.
>
> Per the release notes I was just writing ...
>
> <listitem>
> <!--
> Author: Michael Paquier <michael(at)paquier(dot)xyz>
> Branch: master [8a8661828] 2023-02-22 10:14:52 +0900
> Branch: REL_15_STABLE [fa5dd460c] 2023-02-22 10:14:56 +0900
> -->
> <para>
> Fix potential corruption of the template (source) database after
> <command>CREATE DATABASE</command> with the <literal>STRATEGY
> WAL_LOG</literal> option (Nathan Bossart, Ryo Matsumura)
> </para>

Hmm. That bug seems to have caused corruption (backwards time travel)
of blocks in the *source* DB's pg_class, by failing to write back
changes. We seem to have zeroed pages in the *target* database, for
all catalogs (apparently everything copied by
RelationCopyStorageUsingBuffer()), even though the template is still
fine. It is as if RelationCopyStorageUsingBuffer() created the
zero-filed file with smgrextend(), but then the buffer data was never
written out even though we memcpy'd it into the a buffer and set the
buffer dirty.

Bug-in-PostgreSQL explanations could include that we forgot it was
dirty, or some backend wrote it out to the wrong file; but if we were
forgetting something like permanent or dirty, would there be a more
systematic failure? Oh, it could require special rare timing if it is
similar to 8a8661828's confusion about permanence level or otherwise
somehow not setting BM_PERMANENT, but in the target blocks, so I think
that'd require a checkpoint AND a crash. It doesn't reproduce for me,
but perhaps more unlucky ingredients are needed.

Bug-in-OS/FS explanations could include that a whole lot of writes
were mysteriously lost in some time window, so all those files still
contain the zeroes we write first in smgrextend(). I guess this
previously rare (previously limited to hash indexes?) use of sparse
file hole-punching could be a factor in an it's-all-ZFS's-fault
explanation:

openat(AT_FDCWD,"base/16390/2662",O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC,0600)
= 36 (0x24)
openat(AT_FDCWD,"base/1/2662",O_RDWR|O_CLOEXEC,00) = 37 (0x25)
lseek(37,0x0,SEEK_END) = 32768 (0x8000)
lseek(37,0x0,SEEK_END) = 32768 (0x8000)
pwrite(36,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,8192,0x6000) = 8192
(0x2000) <-- smgrextend(final block)
lseek(36,0x0,SEEK_END) = 32768 (0x8000)

I was trying to think about how I might go about trying to repro the
exact system setup. Evgeny, do you mind sharing your "zfs get all
/path/to/pgdata" (curious to see block size, compression settings,
anything else etc) and your postgresql.conf? And your exact Ubuntu
kernel version and ZFS package versions?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Millas 2023-05-07 11:46:58 Re: Death postgres
Previous Message Tom Lane 2023-05-07 01:21:30 Re: "PANIC: could not open critical system index 2662" - twice