From: | Evgeny Morozov <postgresql3(at)realityexists(dot)net> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: "PANIC: could not open critical system index 2662" - twice |
Date: | 2023-05-07 16:10:28 |
Message-ID: | 01020187f6fa8f05-d1bd9975-48ec-4d8d-9ab7-75478400100d-000000@eu-west-1.amazonses.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 6/05/2023 11:13 pm, Thomas Munro wrote:
> Did you previously run this same workload on versions < 15 and never
> see any problem?
Yes, kind of. We have a test suite that creates one test DB and runs a
bunch of tests on it. Two of these tests, however, create another DB
each (also by cloning the same template DB) in order to test copying
data between DBs. It's only these "extra" DBs that were corrupted, at
least on this occasion. (Hard to say about the last time, because that
time it all went south and the whole server crashed, and we may have had
some residual corruption from bad disks then - who knows.) I'm not sure
whether the tests that created the extra DBs existed before we upgraded
to PG 15, but we definitely have not seen such problems on PG 13 or 14.
> It seems like you have some kind of high frequency testing workload that creates and tests databases all day long, and just occasionally detects this corruption.
Maybe 10-30 times per day normally, depending on the day. However, I
have tried to repro this by running those two specific tests thousands
of times in one day, without success.
> Would you like to try requesting FILE_COPY for a while and see if it eventually happens like that too?
Sure, we can try that.
On 7/05/2023 12:30 pm, Thomas Munro wrote:
> your "zfs get all /path/to/pgdata"
PROPERTY VALUE SOURCE
type filesystem -
creation Mon Mar 6 17:07 2023 -
used 166G -
available 2.34T -
referenced 166G -
compressratio 2.40x -
mounted yes -
quota none default
reservation none default
recordsize 16K local
mountpoint /default
sharenfs off default
checksum on default
compression lz4 received
atime off inherited from pgdata
devices on default
exec off inherited from pgdata
setuid off inherited from pgdata
readonly off default
zoned off default
snapdir hidden default
aclinherit restricted default
createtxg 90 -
canmount on received
xattr on default
copies 1 default
version 5 -
utf8only off -
normalization none -
casesensitivity sensitive -
vscan off default
nbmand off default
sharesmb off default
refquota none default
refreservation none default
primarycache all default
secondarycache all default
usedbysnapshots 199M -
usedbydataset 166G -
usedbychildren 0B -
usedbyrefreservation 0B -
logbias latency default
dedup off default
mlslabel none default
sync standard default
dnodesize legacy default
refcompressratio 2.40x -
written 64.9M -
logicalused 397G -
logicalreferenced 397G -
volmode default default
filesystem_limit none default
snapshot_limit none default
filesystem_count none default
snapshot_count none default
snapdev hidden default
acltype off default
context none default
fscontext none default
defcontext none default
rootcontext none default
relatime off default
redundant_metadata all default
overlay off default
> your postgresql.conf?
We have a bunch of config files, so I tried to get the resulting config
using "select name, setting from pg_settings where source =
'configuration file'" - hopefully that gives what you wanted.
name |
setting
----------------------------+-------------------------------------------------------
archive_command | pgbackrest --stanza="behavior-pg15"
archive-push "%p"
archive_mode | on
archive_timeout | 900
cluster_name | 15/behavior
DateStyle | ISO, MDY
default_text_search_config | pg_catalog.english
dynamic_shared_memory_type | posix
external_pid_file | /var/run/postgresql/15-behavior.pid
full_page_writes | off
lc_messages | C
lc_monetary | C
lc_numeric | C
lc_time | C
listen_addresses | *
log_checkpoints | on
log_connections | on
log_disconnections | on
log_file_mode | 0640
log_line_prefix | %m [%p] %q%u(at)%d
log_lock_waits | on
log_min_duration_statement | 1000
log_temp_files | 0
log_timezone | Etc/UTC
maintenance_work_mem | 1048576
max_connections | 100
max_slot_wal_keep_size | 30000
max_wal_size | 1024
min_wal_size | 80
port | 5434
shared_buffers | 4194304
ssl | on
ssl_cert_file | (redacted)
ssl_ciphers | TLSv1.2:TLSv1.3:!aNULL
ssl_dh_params_file | (redacted)
ssl_key_file | (redacted)
ssl_min_protocol_version | TLSv1.2
temp_buffers | 10240
TimeZone | Etc/UTC
unix_socket_directories | /var/run/postgresql
wal_compression | pglz
wal_init_zero | off
wal_level | replica
wal_recycle | off
work_mem | 262144
> And your exact Ubuntu kernel version and ZFS package versions?
Ubuntu 18.04.6
Kernel 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
zfsutils-linux package version 0.7.5-1ubuntu16.12 amd64
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2023-05-07 16:36:20 | Re: Death postgres |
Previous Message | Marc Millas | 2023-05-07 11:46:58 | Re: Death postgres |