From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17821: Assertion failed in heap_update() due to heap pruning |
Date: | 2023-10-07 08:00:00 |
Message-ID: | 25c3399b-58a3-d727-7f97-93394fd1afa9@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
04.03.2023 19:48, Andres Freund wrote:
> My first suspicion would be that we aren't holding a lock in the right moment, to enforce cache invalidation processing.
>
I've encountered another case of that assertion failure (this time with pg_statistic):
numclients=10
for ((c=1;c<=numclients;c++)); do
createdb db$c
done
for ((i=1;i<=50;i++)); do
echo "iteration $i"
for ((c=1;c<=numclients;c++)); do
echo "
CREATE TABLE t(i int4, t1 text[], t2 text[]);
INSERT INTO t SELECT g, ARRAY[repeat('x', g)], ARRAY[repeat('x', g)] FROM generate_series(1, 200) g;
ANALYZE t;
" | psql -d db$c >/dev/null
done
for ((c=1;c<=numclients;c++)); do
echo "
ANALYZE t;
" | psql -d db$c >/dev/null &
echo "
SELECT * FROM t WHERE i = 1;
ANALYZE t;
" | psql -d db$c >/dev/null &
done
wait
grep 'TRAP:' server.log && { break; }
for ((c=1;c<=numclients;c++)); do
echo "
DROP TABLE t; VACUUM pg_statistic;
" | psql -d db$c >/dev/null
done
done
...
iteration 13
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3074, PID: 89697
Reproduced on REL_12_STABLE (starting from 5e0928005) .. master.
(Modifying pruneheap.c as in [1] not needed for this case.)
Here we have the following sequence of events:
Session 1
ANALYZE t; // needed just to produce material for pruning
do_analyze_rel() -> update_attstats() -> CatalogTupleUpdateWithInfo() -> heap_update(ctid(18,16));
PrepareToInvalidateCacheTuple(hashvalue: {hash1})
Session 2:
SELECT * FROM t WHERE i = 1;
SearchCatCacheInternal(hashValue: {hash1})
SearchCatCacheMiss(hashValue: {hash1})
ProcessInvalidationMessages()
CatCacheInvalidate(hashValue: {hash1})
CatCacheInvalidate(hashValue: {hash2})
CatCacheInvalidate(hashValue: {hash1})
SearchCatCacheMiss returns tid(18,16)
ANALYZE t;
do_analyze_rel() -> update_attstats()
attno=0 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash2}) -> SearchCatCacheMiss()
-> heap_page_prune(block: 18)
attno=1 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash1}) -> oldtup = tid(18,16)
CatalogTupleUpdateWithInfo() -> heap_update(tid(18, 16))
(A server.log with additional tracing messages for one of the failed runs
is attached.)
In other words, a backend can get some tuple id from catcache and then prune
the underlying page while searching for another (missing) entry in the cache
(any other reading of the page that may trigger pruning will do as well).
After that, an attempt to update that tuple might trigger the Assert().
[1] https://www.postgresql.org/message-id/17821-dd8c334263399284%40postgresql.org
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
server-db6.log | text/x-log | 23.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2023-10-07 12:49:21 | Re: BUG #18149: Incorrect lexeme for english token "proxy" |
Previous Message | Peter Smith | 2023-10-06 06:55:47 | Re: [16+] subscription can end up in inconsistent state |