Re: pg16: invalid page/page verification failed

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg16: invalid page/page verification failed
Date: 2023-10-06 15:00:57
Message-ID: ZSAhKVk9GOO8iL6N@pryzbyj2023
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 06, 2023 at 09:20:05AM +0900, Michael Paquier wrote:
> On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > This table is what it sounds like: a partition into which CSV logs are
> > COPY'ed. It would've been created around 8am. There's no special
> > params set for the table nor for autovacuum.
>
> This may be an important bit of information. 31966b151e6a is new as
> of Postgres 16, has changed the way relations are extended and COPY
> was one area touched. I am adding Andres in CC.

Also, I realized that someone kicked off a process just after 9am which
would've done a lot of INSERT ON CONFLICT DO UPDATE, VACUUM FULL, and
VACUUM. Which consumed and dirtied buffers about 100x faster than normal.

log_time | 2023-10-05 10:00:55.794-05
pid | 31754
left | duration: 51281.001 ms statement: VACUUM (FULL,FREEZE) othertable...

log_time | 2023-10-05 10:01:01.784-05
backend_type | checkpointer
left | checkpoint starting: time

log_time | 2023-10-05 10:01:02.935-05
pid | 10023
left | page verification failed, calculated checksum 5074 but expected 5050
context | COPY postgres_log, line 947
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time | 2023-10-05 10:01:02.935-05
pid | 10023
left | invalid page in block 119 of relation base/16409/801594131
context | COPY postgres_log, line 947
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time | 2023-10-05 10:01:11.636-05
pid | 31754
left | duration: 15838.374 ms statement: VACUUM (FREEZE) othertable...

I meant to point out that the issue is on the last block.

postgres=# SELECT pg_relation_size('"BROKEN_postgres_log_2023_10_05_0900"')/8192;
?column? | 120

It sounds like there may be an issue locking (pinning?) a page, or
rather not locking it, or releasing the lock too early.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2023-10-06 15:20:29 Re: typo in couple of places
Previous Message Aleksander Alekseev 2023-10-06 14:51:59 Re: Add const to values and nulls arguments