Re: Different compression methods for FPI

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Different compression methods for FPI
Date: 2022-09-05 00:23:20
Message-ID: 20220905002320.GD31833@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 02, 2022 at 06:55:11AM -0500, Justin Pryzby wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > In this patch series, I added compression information to the errcontext from
> > xlog_block_info(), and allow specifying compression levels like zlib-2. I'll
> > rearrange that commit earlier if we decide that's desirable to include.
>
> 4035cd5d4 added wal_compress=lz4 and
> e9537321a added wal_compress=zstd
>
> Since 4035cd5d4, pg_waldump has shown compression info (and walinspect
> shows the same since 2258e76f9).
>
> But if you try to replay WAL on a server which wasn't compiled with
> support for the requisite compression methods, it's a bit crummy that it
> doesn't include in the error message the reason *why* it failed to
> restore the image, if that's caused by the missing compression method.

That's also hitting an elog().

2022-09-04 15:56:29.916 CDT startup[2625] FATAL: XX000: failed to restore block image
2022-09-04 15:56:29.916 CDT startup[2625] DETAIL: image at 0/1D11CB8 compressed with zstd not supported by build, block 0
2022-09-04 15:56:29.916 CDT startup[2625] CONTEXT: WAL redo at 0/1D11CB8 for Heap/DELETE: off 50 flags 0x00 KEYS_UPDATED ; blkref #0: rel 1663/16384/2610, blk 4 FPW
2022-09-04 15:56:29.916 CDT startup[2625] LOCATION: XLogReadBufferForRedoExtended, xlogutils.c:396

(gdb) bt
#0 report_invalid_record (state=0x555555e33ff0, fmt=0x555555b1c1a8 "image at %X/%X compressed with %s not supported by build, block %d") at xlogreader.c:74
#1 0x00005555556beeec in RestoreBlockImage (record=record(at)entry=0x555555e33ff0, block_id=block_id(at)entry=0 '\000', page=page(at)entry=0x7fffee9bdc00 "") at xlogreader.c:2078
#2 0x00005555556c5d39 in XLogReadBufferForRedoExtended (record=record(at)entry=0x555555e33ff0, block_id=block_id(at)entry=0 '\000', mode=mode(at)entry=RBM_NORMAL, get_cleanup_lock=get_cleanup_lock(at)entry=false,
buf=buf(at)entry=0x7fffffffd760) at xlogutils.c:395
#3 0x00005555556c5e4a in XLogReadBufferForRedo (record=record(at)entry=0x555555e33ff0, block_id=block_id(at)entry=0 '\000', buf=buf(at)entry=0x7fffffffd760) at xlogutils.c:320
#4 0x000055555565bd5b in heap_xlog_delete (record=0x555555e33ff0) at heapam.c:9032
#5 0x00005555556622b7 in heap_redo (record=<optimized out>) at heapam.c:9836
#6 0x00005555556c15ed in ApplyWalRecord (xlogreader=0x555555e33ff0, record=record(at)entry=0x7fffee2b6820, replayTLI=replayTLI(at)entry=0x7fffffffd870) at ../../../../src/include/access/xlog_internal.h:379
#7 0x00005555556c4c30 in PerformWalRecovery () at xlogrecovery.c:1725
#8 0x00005555556b7f23 in StartupXLOG () at xlog.c:5291
#9 0x0000555555ac4491 in InitPostgres (in_dbname=in_dbname(at)entry=0x555555e09390 "postgres", dboid=dboid(at)entry=0, username=username(at)entry=0x555555dedda0 "pryzbyj", useroid=useroid(at)entry=0,
load_session_libraries=<optimized out>, override_allow_connections=override_allow_connections(at)entry=false, out_dbname=0x0) at postinit.c:731
#10 0x000055555598471f in PostgresMain (dbname=0x555555e09390 "postgres", username=username(at)entry=0x555555dedda0 "pryzbyj") at postgres.c:4085
#11 0x00005555559851b0 in PostgresSingleUserMain (argc=5, argv=0x555555de7530, username=0x555555dedda0 "pryzbyj") at postgres.c:3986
#12 0x0000555555840533 in main (argc=5, argv=0x555555de7530) at main.c:194

I guess it should be promoted to an ereport(), since it's now a
user-facing error rathere than an internal one.

$ ./tmp_install.without-zstd/usr/local/pgsql/bin/postgres -D ./src/test/regress/tmp_check/data
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG: starting PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG: listening on IPv4 address "127.0.0.1", port 5432
2022-09-04 15:28:37.528 CDT postmaster[29964] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-09-04 15:28:37.587 CDT startup[29972] LOG: database system was interrupted while in recovery at 2022-09-04 15:27:44 CDT
2022-09-04 15:28:37.587 CDT startup[29972] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2022-09-04 15:28:38.010 CDT startup[29972] LOG: database system was not properly shut down; automatic recovery in progress
2022-09-04 15:28:38.038 CDT startup[29972] LOG: redo starts at 0/1D118C0
2022-09-04 15:28:38.039 CDT startup[29972] LOG: could not stat file "pg_tblspc/16502": No such file or directory
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT: WAL redo at 0/1D11970 for Tablespace/DROP: 16502
2022-09-04 15:28:38.039 CDT startup[29972] FATAL: failed to restore block image
2022-09-04 15:28:38.039 CDT startup[29972] DETAIL: image at 0/1D11CB8 compressed with zstd not supported by build, block 0
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT: WAL redo at 0/1D11CB8 for Heap/DELETE: off 50 flags 0x00 KEYS_UPDATED ; blkref #0: rel 1663/16384/2610, blk 4 FPW

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 0cda22597fe..01c7454bcc7 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -393,7 +393,11 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
prefetch_buffer);
page = BufferGetPage(*buf);
if (!RestoreBlockImage(record, block_id, page))
- elog(ERROR, "failed to restore block image");
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("failed to restore block image"),
+ errdetail("%s", record->errormsg_buf));
+

/*
* The page may be uninitialized. If so, we can't set the LSN because

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-09-05 00:28:11 Re: Column Filtering in Logical Replication
Previous Message Tomas Vondra 2022-09-04 22:14:48 Re: POC: GROUP BY optimization