From: | Игорь Выскорко <vyskorko(dot)igor(at)yandex(dot)ru> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Function's execute overhead reducing |
Date: | 2019-11-18 05:23:23 |
Message-ID: | 116440101574054603@myt2-f25485a938c5.qloud-c.yandex.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> My suspicion is that the query in the function is hidden from the
> planner and so it ends up running two separate SELECT queries without
> reference to each other. A quick test and possible solution:
>
> My results for the above on old machine.:
>
> Straight query:
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on tst (cost=0.00..1644807.00 rows=500000 width=116) (actual
> time=0.033..2808.596 rows=500000 loops=1)
> SubPlan 1
> -> Aggregate (cost=3.25..3.26 rows=1 width=8) (actual
> time=0.005..0.005 rows=1 loops=500000)
> -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=64) (actual time=0.003..0.003 rows=3 loops=500000)
> Planning Time: 16.162 ms
> Execution Time: 2846.815 ms
>
> Function in query
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------
> Seq Scan on tst (cost=0.00..138557.00 rows=500000 width=116) (actual
> time=0.119..7048.285 rows=500000 loops=1)
> Planning Time: 0.105 ms
> Execution Time: 7098.057 ms
>
> I changed the function to:
>
> CREATE OR REPLACE FUNCTION public.tst_func(i integer, a jsonb, b jsonb)
> RETURNS TABLE(id integer, val bigint)
> LANGUAGE sql
> STABLE
> AS $function$
> select
> i,
> sum(
> ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
> )
> from
> jsonb_each(a) _a
> $function$
>
> Using 'table' function:
>
> test=# explain analyze select
> tst.id,
> j1,
> j2
> from
> tst
> join
> tst_func(id, j1, j2) as f
> on
> tst.id = f.id;
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=1.00..531057.00 rows=1 width=108) (actual
> time=0.042..2002.258 rows=500000 loops=1)
> -> Seq Scan on tst (cost=0.00..13557.00 rows=500000 width=108)
> (actual time=0.014..70.936 rows=500000 loops=1)
> -> Subquery Scan on f (cost=1.00..1.02 rows=1 width=4) (actual
> time=0.003..0.003 rows=1 loops=500000)
> Filter: (tst.id = f.id)
> -> Aggregate (cost=1.00..1.01 rows=1 width=12) (actual
> time=0.003..0.003 rows=1 loops=500000)
> -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=0) (actual time=0.003..0.003 rows=3 loops=500000)
> Planning Time: 0.297 ms
> Execution Time: 2037.601 ms
>
>> PS current situation maybe solved by denormalization with precalculations of function and storing results along with data, but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for instance)
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
Thanks a lot! It works even better than subquery
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-11-18 06:42:10 | Re: REINDEX VERBOSE unknown option |
Previous Message | Andrew Gierth | 2019-11-18 01:56:14 | Re: here does postgres take its timezone information from? |