Quick Links

Re: SQLFunctionCache and generic plans

From:	Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Subject:	Re: SQLFunctionCache and generic plans
Date:	2025-01-29 14:35:52
Message-ID:	ad7296dd981c5f8eb76ff129913a55fd@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane писал(а) 2025-01-17 21:27:
> Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru> writes:
>> I've rebased patch on master. Tests pass here.
>
> The cfbot still doesn't like it; my guess is that you built without
> --with-libxml and so didn't notice the effects on xml.out.

Hi. Thank you for review.

I've updated patch.

> I've looked through the patch briefly and have a few thoughts:
>
> * You cannot use plancache.c like this:
>
> plansource = CreateCachedPlan(NULL, fcache->src,
> CreateCommandTag((Node *)parsetree));
>
> CreateCachedPlan supposes that raw_parse_tree == NULL means an empty
> query. Now aside from the fact that you didn't update the comment
> documenting that, this won't work, because RevalidateCachedQuery
> also knows that raw_parse_tree == NULL means an empty query. If an
> invalidation happens while the function is running, revalidation
> will produce an empty plan list, causing failure. (It seems like
> a regression test case exercising such invalidation is called for.)

Yes, yes... I've spent some time, trying to reproduce it or to prove
that
plan revalidation is impossible (as I considered). I was wrong, plan
revaldiation is possible, and I've added some test cases. First of all,
we don't save our plans to CacheMemoryContext, so usual invalidation
callbacks don't touch our cached plans. However, I've managed
to reproduce it by changing row_security GUC in SQL function. So,
after the first round of creating plan and running them, on second call
we can see that plansource->rewriteRowSecurity mismatches row_security
and invalidate the plan. I've added Query to plansource, and rewrite it
in RevalidateCachedQuery(), as you suggested.

Interesting side effect - earlier if row_security was altered in sql
function,
this didn't have impact on further function execution (generic plan was
already built).
Now, if revalidation for generic plan is triggered, query ends up with
"ERROR: query would be affected by row-level security policy for
table".

>
> * I'm not very convinced by the new logic in
> StmtPlanRequiresRevalidation. Is it really the case that "check for
> CMD_UTILITY" is sufficient? Even if the coding is okay, the comment
> needs work to provide a better explanation why it's okay.

Updated logic to match stmt_requires_parse_analysis().

>
> * I'd say lose the enable_sql_func_custom_plans GUC. We don't have
> anything comparable for plpgsql and there's been (relatively) little
> complaint about that. Aside from adding clutter to the patch, it
> contorts the logic in functions.c because it forces you to support
> two code paths.

Removed it.

>
> * The regression output changes like
>
> -CONTEXT: SQL function "regexp_match" statement 1
> +CONTEXT: SQL function "regexp_match" during startup
>
> seem fairly unfortunate. Aside from making maintenance of the patch
> painful, the changed messages provide strictly less information to the
> user. I would try hard to eliminate that behavioral change. I think
> you could do so, or at least greatly reduce the number of diffs, by
> having init_sql_fcache perform only raw parsing (producing a list of
> RawStmts, or extracting a list of Queries from prosqlbody) and
> delaying creation of a plan for a particular statement until you first
> reach execution of that statement. This would be a better way all
> around because it'd also fix the anomalies I mentioned upthread for
> cases where a DDL statement earlier in the function should affect
> parse analysis of a later statement.

Yes, I've seen this remark. However, was a bit frightened to touch
all guts of functions.c processing. So, instead I've just recorded
a statement number of the currently-planning query and use it in
error messages. If you insist, I can try rewriting current
first-plan-all-then-run approach, but without good tests for
sql functions this looks error-prone.

>
> * release_plans seems kind of wrong: isn't it giving up most of
> the benefit of doing this? plpgsql doesn't seem to release plans
> unless it's forced to revalidate. Why should SQL functions
> behave differently?

The issue is that we potentially create custom plan for each set
of sql function arguments. And in queries like
SELECT f(field) FROM T1
this can consume much memory. The good thing here is that we don't care
much about it if execution switches to generic plans -
after some rows are selected and some custom plans are built.

Of course, there are bad cases. For example, we've seen that
some simple functions like

create function test(n int) returns text as $$
select string_agg(i::text, '') FROM generate_series(1,n) i
$$
language sql;

always use custom plan for small n numbers - it's the same, but cheaper
(as we can't predict correct row number without knowing function
arguments):

Aggregate (cost=0.15..0.16 rows=1 width=32)
-> Function Scan on generate_series i (cost=0.00..0.08 rows=8
width=4)

versus

Aggregate (cost=17.50..17.52 rows=1 width=32)
-> Function Scan on generate_series i (cost=0.00..10.00
rows=1000 width=4)

So if it appears somewhere deep in data generation scenarios, planning
time
can sufficiently rise (earlier generic plan was always used).

Turning back to release_plans(), it's needed, because there's
possibility
that for each set of arguments we create new custom plan. It doesn't
affect
generic plan, as plansource itself holds reference to it.

> Also, I'm pretty sure it's wrong that
> you have
> + ReleaseCachedPlan(cplan, NULL);
> with no ResourceOwner tracking the reference. That probably
> means that the code leaks cached plans on error. For the
> current patch iteration with the function's data structure only
> meant to live as long as the query, it should be sufficient to
> use CurrentResourceOwner to manage these cplan references.

Cached plans are not stored in long-lived context for now. So we can't
use CurrentResourceOwner,
ReleaseCachedPlan(cplan, CurrentResourceOwner) will die on
"Assert(plan->is_saved)".

>
> * The patch is hard to read in functions.c because there's a mix of
> code-motion and actual logic changes that are touching the same
> places. Perhaps it'd be worth splitting it into a series of two
> patches: the first one just does code motion such as pushing existing
> logic into a new subroutine, and then the second one has the logic
> changes. Or maybe that wouldn't help much, but consider giving it a
> try.

Moved splitting of check_planned_stmt() into separate patch.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment	Content-Type	Size
0002-Use-custom-plan-machinery-for-SQL-function.patch	text/x-diff	23.1 KB
0001-Split-out-SQL-functions-checks-from-init_execution_s.patch	text/x-diff	3.1 KB

In response to

Re: SQLFunctionCache and generic plans at 2025-01-17 18:27:36 from Tom Lane

Responses

Re: SQLFunctionCache and generic plans at 2025-01-30 08:50:35 from Alexander Pyhalov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bertrand Drouvot	2025-01-29 14:37:45	Re: Show WAL write and fsync stats in pg_stat_io
Previous Message	Melanie Plageman	2025-01-29 14:08:37	Re: Eagerly scan all-visible pages to amortize aggressive vacuum