From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used |
Date: | 2023-10-14 07:00:00 |
Message-ID: | defada9d-8bb1-860f-2682-eee03fdc0ab4@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
13.10.2023 18:00, Alexander Lakhin wrote:
>
>> I spent some time looking through existing SearchSysCacheExists calls,
>> and I could only find two sets of routines where we seem to be
>> depending on SearchSysCacheExists to protect a subsequent lookup
>> somewhere else, and there isn't any lock on the object in question.
>> Those are the has_foo_privilege functions discussed here, and the
>> foo_is_visible functions near the bottom of namespace.c. I'm not
>> sure why we've not heard complaints traceable to the foo_is_visible
>> family. Maybe nobody has tried hard to break them, or maybe they
>> are just less likely to be used in ways that are at risk.
>
> I'll try to research/break xxx_is_visible and share my findings tomorrow.
>
I tried the script based on the initial reproducer [1]:
for ((n=1;n<=30;n++)); do
echo "ITERATION $n"
numclients=30
for ((c=1;c<=$numclients;c++)); do
cat << EOF | psql >psql_$c.log &
CREATE SCHEMA testxmlschema_$c;
SELECT format('CREATE TABLE testxmlschema_$c.test_%s (a int);', g) FROM
generate_series(1, 30) g
\\gexec
SET parallel_setup_cost = 1;
SET min_parallel_table_scan_size = '1kB';
SELECT oid FROM pg_catalog.pg_class WHERE relnamespace = 1 AND
relkind IN ('r', 'm', 'v') AND pg_catalog.pg_table_is_visible(oid);
SELECT format('DROP TABLE testxmlschema_$c.test_%s', g) FROM
generate_series(1, 30) g
\\gexec
DROP SCHEMA testxmlschema_$c;
EOF
done
wait
grep 'ERROR:' server.log && break;
done
And couldn't get the error, for multiple runs. (Here SELECT oid ... is
based on the query executed by schema_to_xmlschema().)
But I could reliably get the error with
s/pg_table_is_visible(oid)/has_table_privilege (oid, 'SELECT')/.
So there is a difference between these two functions. And the difference is
in their costs.
If I do "ALTER FUNCTION pg_table_is_visible COST 1" before the script,
I get the error as expected.
With cost 10 I see the following plan:
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.42..2922.38 rows=1 width=4)
Index Cond: (relnamespace = '1'::oid)
Filter: ((relkind = ANY ('{r,m,v}'::"char"[])) AND pg_table_is_visible(oid))
But with cost 1:
Gather (cost=1.00..257.10 rows=1 width=4)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on pg_class (cost=0.00..256.00 rows=1 width=4)
Filter: (pg_table_is_visible(oid) AND (relnamespace = '1'::oid) AND (relkind = ANY ('{r,m,v}'::"char"[])))
Rows Removed by Filter: 405
The cost of the pg_foo_is_visible functions was increased in a80889a73.
But all the has_xxx_privilige functions have cost 1, except for
has_any_column_privilege, which cost was also increased in 7449427a1.
So to see the issue we need several ingredients:
1) The mode CATCACHE_FORCE_RELEASE enabled (may be some other way is
possible, I don't know of);
- Thanks to prion for that.
2) A function with the coding pattern
"SearchSysCacheExistsX(); SearchSysCacheX();" called in a parallel worker;
- Thanks to "debug_parallel_query = regress" and low cost of
has_table_privilege() called by schema_to_xmlschema().
3) The catalog cache invalidated by some concurrent activity.
- Thanks to running the test xmlmap in parallel with 16 other tests.
[1] https://www.postgresql.org/message-id/18014-28c81cb79d44295d%40postgresql.org
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Erki Eessaar | 2023-10-14 09:30:48 | System administration functions about relation size ignore changes in the table structure |
Previous Message | Andres Freund | 2023-10-14 02:34:43 | Re: BUG #18130: \copy fails with "could not read block" or "page should be empty but not" errors due to triggers |