From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date: | 2024-04-07 10:00:00 |
Message-ID: | 5cbe0b03-d6f3-501d-3849-534568b0e776@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Robert,
05.04.2024 23:20, Robert Haas wrote:
> On Fri, Oct 29, 2021 at 9:30 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> I can propose the debugging patch to reproduce the issue that replaces
>> the hang with the assert and modifies a pair of crash-causing test
>> scripts to simplify the reproducing. (Sorry, I have no time now to prune
>> down the scripts further as I have to leave for a week.)
> Just FYI, I tried to reproduce this today on v16, using this formula,
> with some hacking around to try to get it working on my MacBook, and I
> couldn't get it to crash.
I've refreshed the script and simplified it a bit not to use Linux
specifics. This works for me (on REL_14_0, with the patch applied,
CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...):
echo "
autovacuum=off
fsync=off
" >> "$PGDATA/postgresql.conf"
pg_ctl -w -l server.log start
export PGDATABASE=regression
createdb regression
echo "
vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class;
select pg_sleep(random()/50);
" >/tmp/17257/pseudo-autovacuum.sql
export PGDATABASE=regression
createdb regression
pgbench -n -f /tmp/17257/inherit.sql -C -T 1200 >pgbench-1.log 2>&1 &
pgbench -n -f /tmp/17257/vacuum.sql -C -T 1200 >pgbench-2.log 2>&1 &
pgbench -n -f /tmp/17257/pseudo-autovacuum.sql -C -c 10 -T 1200 >pgbench-3.log 2>&1 &
wait
grep -E "(TRAP|terminated)" server.log
(Please use the attached inherit.sql, vacuum.sql (excerpts from
src/test/sql/{inherit,vacuum}.sql).)
With PGDATA placed on tmpfs, this script failed for me after 1m31s,
2m35s, 4m12s:
TRAP: FailedAssertion("numretries < 100", File: "vacuumlazy.c", Line: 1726, PID: 951498)
Another possible outcome:
TRAP: FailedAssertion("relid == targetRelId", File: "relcache.c", Line: 1062, PID: 1257766)
And also:
2024-04-07 05:03:21.656 UTC [2905313] LOG: server process (PID 2984687) was terminated by signal 6: Aborted
2024-04-07 05:03:21.656 UTC [2905313] DETAIL: Failed process was running: create table matest0 (id serial primary key,
name text);
With the stack trace:
...
#4 0x00007fc30b4007f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x0000559f50220719 in index_delete_sort_cmp (deltid1=0x559f523a9f40, deltid2=0x7ffd2f9623f8) at heapam.c:7582
#6 0x0000559f50220847 in index_delete_sort (delstate=0x7ffd2f9636f0) at heapam.c:7623
...
(as in [1])
But on dad1539ae I got no failures for 3 runs (the same is on
REL_16_STABLE with a slightly modified lazy_scan_prune patch).
[1] https://www.postgresql.org/message-id/17255-14c0ac58d0f9b583%40postgresql.org
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
inherit.sql | application/sql | 17.4 KB |
vacuum.sql | application/sql | 5.5 KB |
assert-in-lazy_scan_prune-loop.patch | text/x-patch | 581 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tender Wang | 2024-04-07 10:10:47 | Re: Detach Partition produces a --> SQL-Fehler [XX000]: ERROR: could not find ON INSERT check triggers of foreign key constraint 76908 |
Previous Message | Tender Wang | 2024-04-07 09:41:01 | Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF |