Re: BUG #17821: Assertion failed in heap_update() due to heap pruning

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
Date: 2023-10-07 08:00:00
Message-ID: 25c3399b-58a3-d727-7f97-93394fd1afa9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

04.03.2023 19:48, Andres Freund wrote:
> My first suspicion would be that we aren't holding a lock in the right moment, to enforce cache invalidation processing.
>

I've encountered another case of that assertion failure (this time with pg_statistic):
numclients=10
for ((c=1;c<=numclients;c++)); do
  createdb db$c
done

for ((i=1;i<=50;i++)); do
  echo "iteration $i"

  for ((c=1;c<=numclients;c++)); do
    echo "
CREATE TABLE t(i int4, t1 text[], t2 text[]);
INSERT INTO t SELECT g,  ARRAY[repeat('x', g)], ARRAY[repeat('x', g)] FROM generate_series(1, 200) g;
ANALYZE t;
    " | psql -d db$c >/dev/null
  done

  for ((c=1;c<=numclients;c++)); do
    echo "
ANALYZE t;
    " | psql -d db$c >/dev/null &

    echo "
SELECT * FROM t WHERE i = 1;
ANALYZE t;
    " | psql -d db$c >/dev/null &
  done
  wait
  grep 'TRAP:' server.log && { break; }

  for ((c=1;c<=numclients;c++)); do
    echo "
DROP TABLE t; VACUUM pg_statistic;
    " | psql -d db$c >/dev/null
  done
done

...
iteration 13
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3074, PID: 89697

Reproduced on REL_12_STABLE (starting from 5e0928005) .. master.
(Modifying pruneheap.c as in [1] not needed for this case.)

Here we have the following sequence of events:
Session 1
ANALYZE t; // needed just to produce material for pruning
    do_analyze_rel() -> update_attstats() -> CatalogTupleUpdateWithInfo() -> heap_update(ctid(18,16));
PrepareToInvalidateCacheTuple(hashvalue: {hash1})
                Session 2:
                SELECT * FROM t WHERE i = 1;
                    SearchCatCacheInternal(hashValue: {hash1})
                        SearchCatCacheMiss(hashValue: {hash1})
    ProcessInvalidationMessages()
    CatCacheInvalidate(hashValue: {hash1})
    CatCacheInvalidate(hashValue: {hash2})
                            CatCacheInvalidate(hashValue: {hash1})
                        SearchCatCacheMiss returns tid(18,16)

                ANALYZE t;
                    do_analyze_rel() -> update_attstats()
                        attno=0 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash2}) -> SearchCatCacheMiss()
-> heap_page_prune(block: 18)
                        attno=1 SearchSysCache3() -> SearchCatCacheInternal(hashValue: {hash1}) -> oldtup = tid(18,16)
                        CatalogTupleUpdateWithInfo() -> heap_update(tid(18, 16))

(A server.log with additional tracing messages for one of the failed runs
is attached.)

In other words, a backend can get some tuple id from catcache and then prune
the underlying page while searching for another (missing) entry in the cache
(any other reading of the page that may trigger pruning will do as well).
After that, an attempt to update that tuple might trigger the Assert().

[1] https://www.postgresql.org/message-id/17821-dd8c334263399284%40postgresql.org

Best regards,
Alexander

Attachment Content-Type Size
server-db6.log text/x-log 23.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2023-10-07 12:49:21 Re: BUG #18149: Incorrect lexeme for english token "proxy"
Previous Message Peter Smith 2023-10-06 06:55:47 Re: [16+] subscription can end up in inconsistent state