Quick Links

Re: "PANIC: could not open critical system index 2662" - twice

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Evgeny Morozov <postgresql3(at)realityexists(dot)net>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: "PANIC: could not open critical system index 2662" - twice
Date:	2023-05-07 10:30:52
Message-ID:	CA+hUKGJ5zz8a6-D5ZQcoftZbAeBDs_h_3NL87TndKhW_2_aw1Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sun, May 7, 2023 at 1:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Did you previously run this same workload on versions < 15 and never
> > see any problem? 15 gained a new feature CREATE DATABASE ...
> > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a
> > bug somewhere near that, though I have no specific idea.
>
> Per the release notes I was just writing ...
>
> <listitem>
> 
> <para>
> Fix potential corruption of the template (source) database after
> <command>CREATE DATABASE</command> with the <literal>STRATEGY
> WAL_LOG</literal> option (Nathan Bossart, Ryo Matsumura)
> </para>

Hmm. That bug seems to have caused corruption (backwards time travel)
of blocks in the *source* DB's pg_class, by failing to write back
changes. We seem to have zeroed pages in the *target* database, for
all catalogs (apparently everything copied by
RelationCopyStorageUsingBuffer()), even though the template is still
fine. It is as if RelationCopyStorageUsingBuffer() created the
zero-filed file with smgrextend(), but then the buffer data was never
written out even though we memcpy'd it into the a buffer and set the
buffer dirty.

Bug-in-PostgreSQL explanations could include that we forgot it was
dirty, or some backend wrote it out to the wrong file; but if we were
forgetting something like permanent or dirty, would there be a more
systematic failure? Oh, it could require special rare timing if it is
similar to 8a8661828's confusion about permanence level or otherwise
somehow not setting BM_PERMANENT, but in the target blocks, so I think
that'd require a checkpoint AND a crash. It doesn't reproduce for me,
but perhaps more unlucky ingredients are needed.

Bug-in-OS/FS explanations could include that a whole lot of writes
were mysteriously lost in some time window, so all those files still
contain the zeroes we write first in smgrextend(). I guess this
previously rare (previously limited to hash indexes?) use of sparse
file hole-punching could be a factor in an it's-all-ZFS's-fault
explanation:

openat(AT_FDCWD,"base/16390/2662",O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC,0600)
= 36 (0x24)
openat(AT_FDCWD,"base/1/2662",O_RDWR|O_CLOEXEC,00) = 37 (0x25)
lseek(37,0x0,SEEK_END) = 32768 (0x8000)
lseek(37,0x0,SEEK_END) = 32768 (0x8000)
pwrite(36,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,8192,0x6000) = 8192
(0x2000) <-- smgrextend(final block)
lseek(36,0x0,SEEK_END) = 32768 (0x8000)

I was trying to think about how I might go about trying to repro the
exact system setup. Evgeny, do you mind sharing your "zfs get all
/path/to/pgdata" (curious to see block size, compression settings,
anything else etc) and your postgresql.conf? And your exact Ubuntu
kernel version and ZFS package versions?

In response to

Re: "PANIC: could not open critical system index 2662" - twice at 2023-05-07 01:21:30 from Tom Lane

Responses

Re: "PANIC: could not open critical system index 2662" - twice at 2023-05-07 16:10:28 from Evgeny Morozov
Re: "PANIC: could not open critical system index 2662" - twice at 2023-05-08 02:24:48 from Michael Paquier

Browse pgsql-general by date

	From	Date	Subject
Next Message	Marc Millas	2023-05-07 11:46:58	Re: Death postgres
Previous Message	Tom Lane	2023-05-07 01:21:30	Re: "PANIC: could not open critical system index 2662" - twice