Parallel Query - Can it be used within functions?

From: Michael Krüger <michael(at)kruegers(dot)email>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Parallel Query - Can it be used within functions?
Date: 2018-02-06 07:24:34
Message-ID: CACSnzzVDSP3WvoxByFNiEL2RkCdKfqhmoaLDeYdMNXH5izW1XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear community,

I need a bit of advice on parallel query execution.
I have a table with roughly 2 million rows in it. These rows are
distributed over a set of IDs. Therefore I can group my rows based on the
ID set.

If I query the table directly like this:

select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1;
mediatrunkid | count
--------------+-------
409612 | 11862
409613 | 11862
2101260 | 11860
2101261 | 11882
2101263 | 11809
2109452 | 11751
2117644 | 11833
2125836 | 11832
2125839 | 11852
2125841 | 11882
2129932 | 11882
2129933 | 11854
2134028 | 11718
2138124 | 11850
2142220 | 11861
2154508 | 11825
2187276 | 11826
(17 rows)

Time: 44,056 ms

I see from the query plan that it executed nicely in parallel using two
workers:
2018-02-06 08:18:47.381 CET [18898] LOG: duration: 43.072 ms plan:
Query Text: select mediatrunkid,count(*)::numeric from
reports.mediatrunkkpi where mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1;
Finalize GroupAggregate (cost=57784.41..57792.66 rows=300 width=40)
(actual time=40.583..40.596 rows=17 loops=1)
Group Key: mediatrunkid
Buffers: shared hit=3076
-> Sort (cost=57784.41..57785.91 rows=600 width=16) (actual
time=40.578..40.582 rows=51 loops=1)
Sort Key: mediatrunkid
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=3076
-> Gather (cost=57693.72..57756.72 rows=600 width=16) (actual
time=40.534..40.561 rows=51 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=3076
-> Partial HashAggregate (cost=56693.72..56696.72 rows=300
width=16) (actual time=33.901..33.909 rows=17 loops=3)
Group Key: mediatrunkid
Buffers: shared hit=6006
-> Parallel Bitmap Heap Scan on mediatrunkkpi
(cost=4525.01..56279.28 rows=82889 width=8) (actual time=4.575..23.145
rows=67080 loops=3)
Recheck Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Heap Blocks: exact=2253
Buffers: shared hit=6006
-> Bitmap Index Scan on idx_mediatrunkkpi_trunk
(cost=0.00..4475.27 rows=198933 width=0) (actual time=10.057..10.057
rows=201241 loops=1)
Index Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Buffers: shared hit=823

But when I do the very same within a function:

create or replace function reports.generic_query(_sql text)
RETURNS SETOF record
LANGUAGE 'plpgsql'
PARALLEL SAFE
COST 100
SET "TimeZone"='utc'
STABLE
ROWS 10000
AS $BODY$
DECLARE
--
BEGIN
RETURN QUERY EXECUTE _sql;
END
$BODY$;

select * from reports.generic_query($$ select
mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1 $$) as foo (trunkid bigint, count numeric);
trunkid | count
---------+-------
2109452 | 11751
409613 | 11862
2125839 | 11852
2125841 | 11882
2142220 | 11861
2117644 | 11833
2154508 | 11825
2134028 | 11718
2101263 | 11809
2101261 | 11882
2129933 | 11854
2129932 | 11882
2125836 | 11832
2138124 | 11850
409612 | 11862
2187276 | 11826
2101260 | 11860
(17 rows)

Time: 86,275 ms

the execution time almost doubles, and the query no longer is considered
for parallel execution:

2018-02-06 08:20:30.553 CET [18898] LOG: duration: 85.358 ms plan:
Query Text: select mediatrunkid,count(*)::numeric from
reports.mediatrunkkpi where mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1
HashAggregate (cost=60900.33..60904.08 rows=300 width=40) (actual
time=85.341..85.348 rows=17 loops=1)
Group Key: mediatrunkid
Buffers: shared hit=6006
-> Bitmap Heap Scan on mediatrunkkpi (cost=4525.01..59905.66
rows=198933 width=8) (actual time=9.084..39.655 rows=201241 loops=1)
Recheck Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Heap Blocks: exact=5183
Buffers: shared hit=6006
-> Bitmap Index Scan on idx_mediatrunkkpi_trunk
(cost=0.00..4475.27 rows=198933 width=0) (actual time=8.386..8.386
rows=201241 loops=1)
Index Cond: (mediatrunkid = ANY
('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
Buffers: shared hit=823
2018-02-06 08:20:30.553 CET [18898] CONTEXT: PL/pgSQL function
reports.generic_query(text) line 5 at RETURN QUERY
2018-02-06 08:20:30.553 CET [18898] LOG: duration: 85.986 ms plan:
Query Text: select * from reports.generic_query($$ select
mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
mediatrunkid in
(2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
group by 1 $$) as foo (trunkid bigint, count numeric);
Function Scan on generic_query foo (cost=0.25..100.25 rows=10000 width=40)
(actual time=85.968..85.970 rows=17 loops=1)
Buffers: shared hit=6006

This whole topic around parallel execution is fairly new, but from the
documentation I would assume that functions declared as parallel safe, will
really be used in parallel. And even if the function itself is not invoked
in parallel (maybe does not even make sense here), the function body for
sure should run in parallel if I'm not mistaken.

So what do I obviously do wrong here?

Regards,
Michael

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Kretschmer 2018-02-06 07:35:12 Re: Parallel Query - Can it be used within functions?
Previous Message Michael Paquier 2018-02-06 07:19:48 Re: Alter table set logged hanging after writing out all WAL