Re: BUG #16894: PANIC: WAL contains references to invalid pages

From: David Steele <david(at)pgmasters(dot)net>
To: Антон Курочкин <antkurochkin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16894: PANIC: WAL contains references to invalid pages
Date: 2021-02-25 17:53:44
Message-ID: fd63fe44-7858-5b0a-1709-b9a2da11e36c@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2/24/21 7:22 AM, Антон Курочкин wrote:
>
> We got error when PITR restoring individual databases from a backup
> using
> pgbackrest.
>
> Opened issue in pgbackrest repository:
> https://github.com/pgbackrest/pgbackrest/issues/796
> <https://github.com/pgbackrest/pgbackrest/issues/796>
> It would be great if you would look at this problem. All the
> information is
> in the issue.
>
> Pgbackrest developers offered to provide this information to the
> postgres
> folks.
>
> Turned out that the problem is playing WAL files on databases that are
> restored with zero files.
> The database files from the backup are restored successfully, but an
> error
> occurs when starting PG and playing WAL:

<snip>

> < 2021-02-21 10:35:09.050 UTC > WARNING: page 22659 of relation
> base/17221/17557 is uninitialized
> < 2021-02-21 10:35:09.050 UTC > CONTEXT: xlog redo at 7E/8D9C04C0 for
> Heap2/VISIBLE: cutoff xid 810424291 flags 1
> < 2021-02-21 10:35:09.050 UTC > PANIC: WAL contains references to >
invalid pages
> < 2021-02-21 10:35:09.050 UTC > CONTEXT: xlog redo at 7E/8D9C04C0 for
> Heap2/VISIBLE: cutoff xid 810424291 flags 1

For context, when a user performs selective restore all the relations
for databases *not* selected are restored as sparse zero files with the
correct size. WAL replay will update those relations but they are not
expected to be consistent, and in fact we make it impossible to logon to
those databases so the only option is to drop them.

Still, this is a fairly efficient way to get at a single database in a
cluster that contains hundreds or thousands of databases.

Clearly the methodology is a bit unorthodox, but even so we don't expect
to see an error here. This is the first time we've gotten a detailed
analysis of the issue, so I haven't had time to really look into it, but
perhaps there is a full page write missing? In the case of corruption
the administrator might zero a page and we expect WAL replay to succeed,
I think.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Crampton 2021-02-25 18:46:57 Re: BUG #16897: gssenc request slow connection
Previous Message Stephen Frost 2021-02-25 17:43:51 Re: BUG #16897: gssenc request slow connection