Re: performance of sql and plpgsql functions

From: Julius Tuskenis <julius(dot)tuskenis(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance of sql and plpgsql functions
Date: 2024-06-18 13:03:17
Message-ID: 09c67567-f351-415f-af2d-76f7e8661d12@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2024-06-17 17:24, Tom Lane wrote:
> Julius Tuskenis<julius(dot)tuskenis(at)gmail(dot)com> writes:
>> EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
>>   SELECT
>>     COALESCE(sum(mok_nepadengta), 0)
>>   FROM
>>     public.b_pardavimai
>>     JOIN public.b_mokejimai ON (mok_pardavimas = pard_id)
>>   WHERE
>>     (pard_tipas = ANY('{1, 2, 6, 7}'))
>>     AND (mok_saskaita = 7141968)
> I believe that the SQL-language function executor always uses generic
> plans for parameterized queries (which is bad, but nobody's gotten
> round to improving it). So the above is a poor way of investigating
> what will happen, because it corresponds to a custom plan for the
> value 7141968. You should try something like
>
> PREPARE p(integer) AS
> SELECT COALESCE ...
> ... AND (mok_saskaita = $1);
>
> SET plan_cache_mode TO force_generic_plan;
>
> EXPLAIN ANALYZE EXECUTE p(7141968);
>
> What I suspect is that the statistics for mok_saskaita are
> highly skewed and so with a generic plan the planner will
> not risk using a plan that depends on the parameter value
> being infrequent, as the one you're showing does.
>
> regards, tom lane

Thank you Tom Lane, for pointing the problem.

In deed, after setting plan_cache_mode to force_generic_plan I see very
different plan:

```

"Finalize Aggregate  (cost=6901.01..6901.02 rows=1 width=32) (actual
time=50.258..56.004 rows=1 loops=1)"
"  Output: COALESCE(sum((b_mokejimai.mok_nepadengta)::numeric),
'0'::numeric)"
"  Buffers: shared hit=4"
"  ->  Gather  (cost=6900.89..6901.00 rows=1 width=32) (actual
time=0.809..55.993 rows=2 loops=1)"
"        Output: (PARTIAL sum((b_mokejimai.mok_nepadengta)::numeric))"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        Buffers: shared hit=4"
"        ->  Partial Aggregate  (cost=5900.89..5900.90 rows=1 width=32)
(actual time=0.077..0.079 rows=1 loops=2)"
"              Output: PARTIAL sum((b_mokejimai.mok_nepadengta)::numeric)"
"              Buffers: shared hit=4"
"              Worker 0: actual time=0.052..0.053 rows=1 loops=1"
"              ->  Nested Loop  (cost=25.92..5897.69 rows=1280 width=3)
(actual time=0.070..0.072 rows=0 loops=2)"
"                    Output: b_mokejimai.mok_nepadengta"
"                    Inner Unique: true"
"                    Buffers: shared hit=4"
"                    Worker 0: actual time=0.043..0.043 rows=0 loops=1"
"                    ->  Parallel Bitmap Heap Scan on
public.b_mokejimai  (cost=25.48..2455.36 rows=1307 width=7) (actual
time=0.069..0.070 rows=0 loops=2)"
"                          Output: b_mokejimai.mok_id,
b_mokejimai.mok_moketojas, b_mokejimai.mok_pardavimas,
b_mokejimai.mok_laikas, b_mokejimai.mok_suma, b_mokejimai.mok_budas,
b_mokejimai.mok_terminas, b_mokejimai.mok_cekis,
b_mokejimai.mok_saskaita, b_mokejimai.mok_suma_bazine,
b_mokejimai.mok_nepadengta, b_mokejimai.mok_padengta,
b_mokejimai.mok_laiko_diena"
"                          Recheck Cond: (b_mokejimai.mok_saskaita = $1)"
"                          Buffers: shared hit=4"
"                          Worker 0: actual time=0.042..0.042 rows=0
loops=1"
"                          ->  Bitmap Index Scan on idx_saskaita 
(cost=0.00..24.93 rows=2222 width=0) (actual time=0.023..0.023 rows=0
loops=1)"
"                                Index Cond: (b_mokejimai.mok_saskaita =
$1)"
"                                Buffers: shared hit=4"
"                    ->  Index Scan using pk_b_pardavimai_id on
public.b_pardavimai  (cost=0.44..2.63 rows=1 width=4) (never executed)"
"                          Output: b_pardavimai.pard_id,
b_pardavimai.pard_preke, b_pardavimai.pard_kaina,
b_pardavimai.pard_nuolaida, b_pardavimai.pard_kiekis,
b_pardavimai.pard_kasos_nr, b_pardavimai.pard_laikas,
b_pardavimai.pard_prekes_id, b_pardavimai.pard_pirkejo_id,
b_pardavimai.pard_pardavejas, b_pardavimai.pard_spausdinta,
b_pardavimai.pard_reikia_grazinti, b_pardavimai.pard_kam_naudoti,
b_pardavimai.pard_susieta, b_pardavimai.pard_galima_anuliuoti,
b_pardavimai.pard_tipas, b_pardavimai.pard_pvm,
b_pardavimai.pard_apsilankymas, b_pardavimai.pard_fk,
b_pardavimai.pard_kelintas, b_pardavimai.pard_precekis,
b_pardavimai.pard_imone, b_pardavimai.pard_grazintas,
b_pardavimai.pard_debeto_sutartis, b_pardavimai.pard_kaina_be_nld,
b_pardavimai.pard_uzsakymas_pos, b_pardavimai.pard_pvm_suma,
b_pardavimai.pard_uzsakymo_nr, b_pardavimai.pard_nuolaidos_id,
b_pardavimai.pard_nuolaida_taikyti,
b_pardavimai.pard_pirkeja_keisti_galima,
b_pardavimai.pard_suma_keisti_galima"
"                          Index Cond: (b_pardavimai.pard_id =
b_mokejimai.mok_pardavimas)"
"                          Filter: (b_pardavimai.pard_tipas = ANY
('{1,2,6,7}'::integer[]))"
"Planning Time: 0.016 ms"
"Execution Time: 56.097 ms"

```

If I understand the plan correctly, the problem is the planner expects
to find 2222 records for a provide value of `mok_saskaita`.  I've tried
running analyze on `b_mokejimai`, but the plan remains the same - must
be because some values of `mok_saskaita` do really return tens of
thousands of records.

I don't know how the planner comes up with value 2222, because on
average there are 15 b_mokejimai records for a single mok_saskaita (if
NULL in mok_saskata is ignored), and 628 records if not.

Anyway...

Do you think rewriting a function in plpgsql is a way to go in such
case? In pg documentation
(https://www.postgresql.org/docs/12/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING)
I read that the plan for the plpgsql function is calculated the first
time the function is executed (for a connection). I'm concerned, that
the function execution is not replanned: I will be stuck with a plan
that corresponds to the `mok_saskaita` parameter value passed on the
first execution. Or am I wrong?

Is there a way to make PostgreSQL recalculate the plan on each execution
of the function? The observed planning times are acceptable for my
application.

Regards,

Julius Tuskenis

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Simpson 2024-06-24 22:28:13 Row level security
Previous Message Tom Lane 2024-06-17 14:24:07 Re: performance of sql and plpgsql functions