Re: SQLFunctionCache and generic plans

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: SQLFunctionCache and generic plans
Date: 2024-09-20 12:06:27
Message-ID: CAPpHfdue9RVJKo33jKwrCeEXec0BTTH7CTxVcyXGMbfMExaGLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Alexander!

On Tue, Sep 3, 2024 at 10:33 AM Alexander Pyhalov
<a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
> Tom Lane писал(а) 2023-02-07 18:29:
> > Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> writes:
> >> The following comment can be found in functions.c, about the
> >> SQLFunctionCache:
> >
> >> * Note that currently this has only the lifespan of the calling
> >> query.
> >> * Someday we should rewrite this code to use plancache.c to save
> >> parse/plan
> >> * results for longer than that.
> >
> >> I would be interested in working on this, primarily to avoid this
> >> problem of
> >> having generic query plans for SQL functions but maybe having a longer
> >> lived
> >> cache as well would be nice to have.
> >> Is there any reason not too, or pitfalls we would like to avoid ?
> >
> > AFAIR it's just lack of round tuits. There would probably be some
> > semantic side-effects, though if you pay attention you could likely
> > make things better while you are at it. The existing behavior of
> > parsing and planning all the statements at once is not very desirable
> > --- for instance, it doesn't work to do
> > CREATE TABLE foo AS ...;
> > SELECT * FROM foo;
> > I think if we're going to nuke this code and start over, we should
> > try to make that sort of case work.
> >
> > regards, tom lane
>
> Hi.
>
> I've tried to make SQL functions use CachedPlan machinery. The main goal
> was to allow SQL functions to use custom plans
> (the work was started from question - why sql function is so slow
> compared to plpgsql one). It turned out that
> plpgsql function used custom plan and eliminated scan of all irrelevant
> sections, but
> exec-time pruning didn't cope with pruning when ScalarArrayOpExpr,
> filtering data using int[] parameter.
>
> In current prototype there are two restrictions. The first one is that
> CachecPlan has lifetime of a query - it's not
> saved for future use, as we don't have something like plpgsql hashtable
> for long live function storage. Second -
> SQL language functions in sql_body form (with stored queryTree_list) are
> handled in the old way, as we currently lack
> tools to make cached plans from query trees.
>
> Currently this change solves the issue of inefficient plans for queries
> over partitioned tables. For example, function like
>
> CREATE OR REPLACE FUNCTION public.test_get_records(ids integer[])
> RETURNS SETOF test
> LANGUAGE sql
> AS $function$
> select *
> from test
> where id = any (ids)
> $function$;
>
> for hash-distributed table test can perform pruning in plan time and
> can have plan like
>
> Append (cost=0.00..51.88 rows=26 width=36)
> -> Seq Scan on test_0 test_1 (cost=0.00..25.88 rows=13
> width=36)
> Filter: (id = ANY ('{1,2}'::integer[]))
> -> Seq Scan on test_2 (cost=0.00..25.88 rows=13 width=36)
> Filter: (id = ANY ('{1,2}'::integer[]))
>
> instead of
>
> Append (cost=0.00..155.54 rows=248 width=36)
> -> Seq Scan on test_0 test_1 (cost=0.00..38.58 rows=62
> width=36)
> Filter: (id = ANY ($1))
> -> Seq Scan on test_1 test_2 (cost=0.00..38.58 rows=62
> width=36)
> Filter: (id = ANY ($1))
> -> Seq Scan on test_2 test_3 (cost=0.00..38.58 rows=62
> width=36)
> Filter: (id = ANY ($1))
> -> Seq Scan on test_3 test_4 (cost=0.00..38.58 rows=62
> width=36)
> Filter: (id = ANY ($1))
>
> This patch definitely requires more work, and I share it to get some
> early feedback.
>
> What should we do with "pre-parsed" SQL functions (when prosrc is
> empty)? How should we create cached plans when we don't have raw
> parsetrees?
> Currently we can create cached plans without raw parsetrees, but this
> means that plan revalidation doesn't work, choose_custom_plan()
> always returns false and we get generic plan. Perhaps, we need some form
> of GetCachedPlan(), which ignores raw_parse_tree?

I don't think you need a new form of GetCachedPlan(). Instead, it
seems that StmtPlanRequiresRevalidation() should be revised. As I got
from comments and the d8b2fcc9d4 commit message, the primary goal was
to skip revalidation of utility statements. Skipping revalidation was
a positive side effect, as long as we didn't support custom plans for
them anyway. But as you're going to change this,
StmtPlanRequiresRevalidation() needs to be revised.

I also think it's not necessary to implement long-lived plan cache in
the initial patch. The work could be split into two patches. The
first could implement query lifetime plan cache. This is beneficial
already by itself as you've shown by example. The second could
implement long-lived plan cache.

I appreciate your work in this direction. I hope you got the feedback
to go ahead and work on remaining issues.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-09-20 12:08:28 Re: not null constraints, again
Previous Message vignesh C 2024-09-20 12:04:52 Re: Documentation to upgrade logical replication cluster