Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Davinder Singh <davinder(dot)singh(at)enterprisedb(dot)com>, Dilip Kumar <dilip(dot)kumar(at)enterprisedb(dot)com>
Subject: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
Date: 2024-11-21 10:44:26
Message-ID: CAKZiRmyM4YnokK6Oenw5JKwAQ3rhP0YTz2T-tiw5dAQjGRXE3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I'm sharing with the community what seems to be a new bug report about
basebackup affecting only basebackup on WIN32 - that was reproduced based
on a 3rd party customer case. Maybe anyone has hints (internally EDB is
just about to start looking on this too):

C:\backup>"C:\Program Files\PostgreSQL\15\bin\pg_basebackup" -U postgres -D
"backup2" -F t -P -X stream -c fast --compress=none --create-slot
--slot=slot1
Password:
pg_basebackup: error: could not close file "00000001000000020000007D":
Invalid argument
pg_basebackup: error: background process terminated unexpectedly

C:\backup>systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.26100 N/A Build 26100

C:\backup>"C:\Program Files\PostgreSQL\15\bin\pg_basebackup" --version

pg_basebackup (PostgreSQL) 15.8 // PostgreSQL 15.8, compiled by Visual C++
build 1941, 64-bit

C:\backup>dir "c:\Program Files\PostgreSQL\15\data\pg_wal"
[..]
11/20/2024 08:26 AM 16,777,216 0000000100000002000000C2
11/20/2024 08:26 AM 16,777,216 0000000100000002000000C3
11/20/2024 08:27 AM 16,777,216 0000000100000002000000C4
11/20/2024 08:27 AM 16,777,216 0000000100000002000000C5
11/20/2024 08:27 AM 16,777,216 0000000100000002000000C6
10/29/2024 10:25 AM <DIR> archive_status
706 File(s) 11,844,714,496 bytes <<<------ 11GB WALs
3 Dir(s) 246,568,095,744 bytes free

C:\backup>dir backup2
[..]
11/20/2024 08:24 AM 3,740,529,664 base.tar
11/20/2024 08:24 AM 2,147,614,208 pg_wal.tar <<<---BUG triggers
2 File(s) 5,888,143,872 bytes
2 Dir(s) 246,568,091,648 bytes free

Note: 2^31-(2147614208) = -130560

--- how to reproduce:
1. Install PostgreSQL on Windows.
2. alter system set max_wal_size='3GB';
3. select pg_reload_conf();
4. create and load ~big table (~25GB) with disabled TOAST compression to
trigger higher WAL generation:
create table t (id bigint, t text);
alter table t alter column t set storage external;
insert into t select i::bigint as id, repeat(md5(i::text),4000)::text as r
from generate_series(1, 2000000) s(i);

5. While the above is running , run pg_basebackup.exe with -F tar mode in
parallel so that it has chance to create >2GB pg_wal.tar file

Please see the screenshot from strace/sysinternals: it seems it is coming
because of a failure to WriteFile() nearby 2^31 offset (casting
issue?/integer overflow?) -- (5) -- when writing to pg_wal.tar, which
triggers the unhelpful error message for close()/CloseFile() ==
E_INVALID_ARG / ERROR_INVALID_HANDLE -- (1). Also notice a nearby marker
(1) after writing 40960b it goes reverse (lseeks back to
2147573248-2130836480=16736768b) and writes final 512b (header?) -- nearby
(5), just before issuing that QueryStandardInformationFile() call
(stat()??) - which probably is just somehow part of error handling (?).
This literally looks like something like off_t/size_t would be limited to
2^31 somewhere. All of this is on standard NTFS and affects only pg_wal.tar
, but not e.g. base.tar (in the original report it whas writing at offsets
~ 51GB fine).

-J.

Attachment Content-Type Size
DB_2834_sysinternals.png image/png 182.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-11-21 12:00:22 Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
Previous Message Alexander Kukushkin 2024-11-21 10:42:13 Re: pg_rewind WAL segments deletion pitfall