BUG #18163: Catcache entry invalidation might be missed when toast processed inside CatalogCacheCreateEntry

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #18163: Catcache entry invalidation might be missed when toast processed inside CatalogCacheCreateEntry
Date: 2023-10-20 08:00:00
Message-ID: 18163-859bad19a43edcf6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18163
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 16.0
Operating system: Ubuntu 22.04
Description:

[ inspired by Tom Lane's suspicions in the discussion of bug #18014 ]

When a catalog entry created inside CatalogCacheCreateEntry, for tuples
with toasted fields this function fetches toast values and thus processes
AcceptInvalidationMessages calls, that might invalidate the entry created,
but CatalogCacheCreateEntry still returns the entry as valid.

I've constructed the following demo script for the issue:
numraces=10
echo "
SELECT format('CREATE ROLE user_%s', i) FROM generate_series(1, 4000) i
\gexec" | psql >/dev/null

for ((c=1;c<=$numraces;c++)); do
createdb db$c
echo "
CREATE DOMAIN dom AS text;

SELECT format('
GRANT USAGE ON TYPE dom TO user_%s', i) FROM generate_series(1, 4000) i
\gexec

CREATE TABLE tab(time timestamp, id int, content dom);
" | psql db$c >/dev/null
done

for ((n=1;n<=300;n++)); do
echo "ITERATION $n"

for ((c=1;c<=$numraces;c++)); do
echo "
BEGIN;
ALTER DOMAIN dom SET DEFAULT 'x';
SELECT pg_sleep(0.1);
COMMIT;
SELECT pg_sleep(0.05 + random() * 0.10); -- tuning might be needed
INSERT INTO tab SELECT now(), 101; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 102; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 103; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 104; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 105;
" | psql db$c >/dev/null &

echo "
BEGIN;
SELECT pg_sleep(0.05);
SELECT oid FROM pg_type WHERE typname = 'dom' FOR UPDATE;
ROLLBACK;
ALTER DOMAIN dom SET DEFAULT 'xx';
INSERT INTO tab SELECT now(), 201; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 202; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 203; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 204; SELECT pg_sleep(0.1);
INSERT INTO tab SELECT now(), 205;
" | psql db$c >/dev/null &
done
wait

for ((c=1;c<=$numraces;c++)); do
errcnt=$(
echo "
SELECT COUNT(*) FROM tab t, tab t1, tab t2
WHERE t1.time < t.time AND t1.content <> t.content
AND t2.time > t.time AND t2.content <> t.content;
" | psql db1 -Aqt)
if [[ $errcnt == 0 ]]; then
psql db$c -c "TRUNCATE TABLE tab" >/dev/null
else
echo "Pair $c has got inconsistent domain definitions:"
psql db$c -c "SELECT time, id, content FROM tab ORDER BY time"
break
fi
done

[[ $errcnt == 0 ]] || break;
done

It shows how two sessions might see two different live catcache entries
simultaneously:
...
ITERATION 20
Pair 1 has got inconsistent domain definitions:
time | id | content
----------------------------+-----+---------
2023-10-20 10:31:30.214179 | 101 | x
2023-10-20 10:31:30.222808 | 201 | xx
2023-10-20 10:31:30.324768 | 202 | xx
2023-10-20 10:31:30.326976 | 102 | x
2023-10-20 10:31:30.42655 | 203 | xx
2023-10-20 10:31:30.428698 | 103 | x
2023-10-20 10:31:30.527381 | 204 | xx
2023-10-20 10:31:30.530396 | 104 | x
2023-10-20 10:31:30.629155 | 205 | xx
2023-10-20 10:31:30.632042 | 105 | x
(10 rows)

I've patched catcache.c to catch the anomaly easier:
@@ -40,2 +40,3 @@
#include "utils/syscache.h"
+#include "tcop/tcopprot.h"

@@ -1812,2 +1813,3 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple
ntp, Datum *arguments,
Assert(!negative);
+if (debug_query_string && strstr(debug_query_string, "now(), 101") !=
NULL) pg_usleep(1000L);

but I believe it could be seen without patching, that would just require
more time.

Browse pgsql-bugs by date

  From Date Subject
Next Message Janning Vygen 2023-10-20 10:26:06 slow pg_dump with bytea
Previous Message Laurenz Albe 2023-10-20 06:32:57 Re: group by true now errors with non-integer constant in GROUP BY