Cannot spot the difference in two queries

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Cannot spot the difference in two queries
Date: 2017-04-27 18:05:07
Message-ID: 4563a265-502b-7664-5647-ab83fd75885d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have three very similar functions, two of which are fast and one is
slow and I cannot explain the slowness of the third. All of which, with
the correct arguments should return the same "optimal set". I present
one of the two fast versions and the slow one, in whole and in part.I'm
using postgres 9.6 on ubuntu 16.4 desktop (16G memory).

I'm confused because the bulk of the difference in the sql is in the
"with" clause and those are equally quick and return the same dataset,
and to my eye the processing after that is the same as sql but clearly
not in the server. I'm missing something obvious I'm sure but whatever
it is is causing a nasty Cartesian then sorting that out.

(And I get that mm.markerset_id is not needed in the where clause)

Any insights appreciated.
Thanks,
rjs

I have not included table defs, thinking that since they're the same for
both queries ...

Fast function

create or replace function seg.optimal_pvalue_set(people_name text,
markers_name text, chr int)
returns table (segment_id uuid, optval numeric, firstbase int) as
$$
declare
mkset uuid;
rcount int;
begin
select id into mkset from seg.markerset where name = markers_name
and chrom = chr;
--
create temp table optmarkers on commit drop as
with segset as (
select s.id
, s.chrom
, s.markerset_id
, s.startbase
, s.endbase
, ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join (select people_id, array_agg(person_id) as persons
from seg.people_member
group by people_id) as pa on i.probands <@ pa.persons
join seg.people o on pa.people_id = o.id
where
s.markerset_id = mkset
and o.name = people_name
)
select m.id as mkrid
, min(ss.pval) as optval
from segset ss
join seg.markerset_member mm on ss.markerset_id =
mm.markerset_id
join seg.marker m on mm.member_id = m.id
where
m.basepos between ss.startbase and ss.endbase
and m.chrom = ss.chrom
and mm.markerset_id = mkset --
'b474655c-80d2-47e7-bcb5-c65245195888'
group by m.id;
--
get diagnostics rcount = ROW_COUNT;
raise notice '% segments to optimize', rcount;
--
return query
select s.id as segment_id, o.optval, min(m.basepos) as firstbase
from optmarkers o
join seg.marker m on o.mkrid = m.id
join seg.markerset_member mm on m.id = mm.member_id
join seg.segment s on mm.markerset_id = s.markerset_id
where mm.markerset_id = mkset
and m.basepos between s.startbase and s.endbase
and ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) = o.optval
group by s.id, o.optval
order by firstbase;
end;
$$ language plpgsql;

-- timing the CTE base
--with, the first
select count(s.id)
-- s.id
-- , s.chrom
-- , s.markerset_id
-- , s.startbase
-- , s.endbase
-- , ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join (select people_id, array_agg(person_id) as persons
from seg.people_member
group by people_id) as pa on i.probands <@ pa.persons
join seg.people o on pa.people_id = o.id
where
s.markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'
--mkset
and o.name = '1151704'; -- people_name
count
-------
30762
(1 row)

--explain analyze of Full CTE portion (that which constructs the
temp table in prep for final query)
--manually supplying the needed ids and such.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=15412.11..15413.24 rows=65 width=48) (actual
time=5462.143..6042.901 rows=52196 loops=1)
Group Key: m.id
CTE segset
-> Nested Loop (cost=4820.07..6024.99 rows=310 width=76)
(actual time=11.854..449.869 rows=30762 loops=1)
-> Seq Scan on probandset i (cost=0.00..21.27 rows=1
width=16) (actual time=0.007..0.286 rows=57 loops=1)
Filter: ((probands <@
'{1bfa3037-bf65-483b-9ac5-f7a57f9e48a6,69f339e3-8ef4-440d-82ee-76627caf8e8b,7a2150d9-cd9c-454a-92d4-4b5cda85258d,7b42b108-8c17-4261-a4d1-3ed1f5da892d,b56886d3-981e-4872-847f-ce019fe70a87,c7f46c7d-0223-4b89-8093-771b5d6964d8}'::uuid[])
AND (people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid))
Rows Removed by Filter: 494
-> Bitmap Heap Scan on segment s (cost=4820.07..5995.20
rows=310 width=84) (actual time=7.438..7.493 rows=540 loops=57)
Recheck Cond: ((probandset_id = i.id) AND
(markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid))
Heap Blocks: exact=646
-> BitmapAnd (cost=4820.07..4820.07 rows=310
width=0) (actual time=7.434..7.434 rows=0 loops=57)
-> Bitmap Index Scan on useg
(cost=0.00..283.82 rows=5235 width=0) (actual time=0.357..0.357
rows=6470 loops=57)
Index Cond: (probandset_id = i.id)
-> Bitmap Index Scan on
segment_markerset_id_idx (cost=0.00..4493.24 rows=170774 width=0)
(actual time=7.064..7.064 rows=150550 loops=57)
Index Cond: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Sort (cost=9387.11..9387.28 rows=65 width=48) (actual
time=5462.122..5818.158 rows=1780618 loops=1)
Sort Key: m.id
Sort Method: external merge Disk: 64168kB
-> Nested Loop (cost=105.76..9385.16 rows=65 width=48)
(actual time=11.884..4648.098 rows=1780618 loops=1)
-> Nested Loop (cost=105.34..9013.49 rows=706
width=64) (actual time=11.875..944.708 rows=1796463 loops=1)
-> CTE Scan on segset ss (cost=0.00..6.97
rows=2 width=60) (actual time=11.857..461.254 rows=30762 loops=1)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Bitmap Heap Scan on marker m
(cost=105.34..4471.17 rows=3209 width=24) (actual time=0.007..0.011
rows=58 loops=30762)
Recheck Cond: ((chrom = ss.chrom) AND
(basepos >= ss.startbase) AND (basepos <= ss.endbase))
Heap Blocks: exact=54458
-> Bitmap Index Scan on
marker_chrom_basepos_idx (cost=0.00..104.54 rows=3209 width=0)
(actual time=0.005..0.005 rows=58 loops=30762)
Index Cond: ((chrom = ss.chrom)
AND (basepos >= ss.startbase) AND (basepos <= ss.endbase))
-> Index Scan using markerset_member_member_id_idx
on markerset_member mm (cost=0.42..0.52 rows=1 width=32) (actual
time=0.002..0.002 rows=1 loops=1796463)
Index Cond: (member_id = m.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 0
Planning time: 0.478 ms
Execution time: 6052.463 ms
(33 rows)

Time: 6053.983 ms

Slow Function

-- For an (potentially) arbitrary of grouping of probandsets
create or replace function seg.optimal_pvalue_set_group(pbgroup_id
uuid, markers_name text, people_name text, chr int)
returns table (segment_id uuid, optval numeric, firstbase int) as
$$
declare
mkset uuid;
rcount int;
peop_id uuid;
begin
select id into mkset from seg.markerset where name = markers_name
and chrom = chr;
select id into peop_id from seg.people where name = people_name;
raise notice 'marker set id is %; people id is % (%)', mkset,
peop_id, clock_timestamp();
--
create temp table optmarkers on commit drop as
with segset as (
select s.id
, s.chrom
, s.markerset_id
, s.startbase
, s.endbase
, ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join seg.probandset_group_member pgm on s.probandset_id
= pgm.member_id
where
s.markerset_id = mkset
and i.people_id = peop_id
and pgm.group_id = pbgroup_id
)
select m.id as mkrid
, min(ss.pval) as optval
from segset ss
join seg.markerset_member mm on ss.markerset_id =
mm.markerset_id
join seg.marker m on mm.member_id = m.id
where
m.basepos between ss.startbase and ss.endbase
and m.chrom = ss.chrom
and mm.markerset_id = mkset
group by m.id;
get diagnostics rcount = ROW_COUNT;
raise notice 'found % segments to optimize by probandset group id
% (%)', rcount, pbgroup_id, clock_timestamp();
--
return query
select s.id as segment_id, o.optval, min(m.basepos) as firstbase
from optmarkers o
join seg.marker m on o.mkrid = m.id
join seg.markerset_member mm on m.id = mm.member_id
join seg.segment s on mm.markerset_id = s.markerset_id
where mm.markerset_id = mkset
and m.basepos between s.startbase and s.endbase
and ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) = o.optval
group by s.id, o.optval
order by firstbase;
end;
$$ language plpgsql;

--timing the CTE base
--with, the third
select count(s.id)
-- s.id
-- , s.chrom
-- , s.markerset_id
-- , s.startbase
-- , s.endbase
-- , ((s.events_equal + s.events_greater)/(1.0 *
(s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join seg.probandset_group_member pgm on s.probandset_id
= pgm.member_id
where
s.markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'
-- mkset
and i.people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'
-- peop_id
and pgm.group_id =
'e7c01f89-8301-4221-9980-343b119711c5' -- pbgroup_id
;
count
-------
30392
(1 row)

Time: 146.692 ms

--explain analyze of Full CTE
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=39420.04..39425.18 rows=294 width=48)
(actual time=154302.816..154860.213 rows=52196 loops=1)
Group Key: m.id
CTE segset
-> Nested Loop (cost=282.86..23235.89 rows=1795 width=76)
(actual time=0.568..82.964 rows=30392 loops=1)
-> Hash Join (cost=11.11..31.27 rows=6 width=32)
(actual time=0.035..0.147 rows=56 loops=1)
Hash Cond: (i.id = pgm.member_id)
-> Seq Scan on probandset i (cost=0.00..19.89
rows=57 width=16) (actual time=0.005..0.092 rows=57 loops=1)
Filter: (people_id =
'10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid)
Rows Removed by Filter: 494
-> Hash (cost=10.41..10.41 rows=56 width=16)
(actual time=0.027..0.027 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Bitmap Heap Scan on
probandset_group_member pgm (cost=4.71..10.41 rows=56 width=16)
(actual time=0.013..0.021 rows=56 loops=1)
Recheck Cond: (group_id =
'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
Heap Blocks: exact=1
-> Bitmap Index Scan on psgm_pkey
(cost=0.00..4.70 rows=56 width=0) (actual time=0.008..0.008 rows=56
loops=1)
Index Cond: (group_id =
'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
-> Bitmap Heap Scan on segment s (cost=271.75..3859.10
rows=310 width=84) (actual time=0.446..1.196 rows=543 loops=56)
Recheck Cond: (probandset_id = i.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 5974
Heap Blocks: exact=8223
-> Bitmap Index Scan on useg (cost=0.00..271.68
rows=5235 width=0) (actual time=0.434..0.434 rows=6517 loops=56)
Index Cond: (probandset_id = i.id)
-> Sort (cost=16184.15..16184.88 rows=294 width=48) (actual
time=154302.795..154633.935 rows=1767743 loops=1)
Sort Key: m.id
Sort Method: external merge Disk: 63712kB
-> Nested Loop (cost=40.92..16172.09 rows=294 width=48)
(actual time=96.084..153723.675 rows=1767743 loops=1)
-> Hash Join (cost=40.50..14499.61 rows=3177
width=64) (actual time=96.074..151048.722 rows=1783466 loops=1)
Hash Cond: (m.chrom = ss.chrom)
Join Filter: ((m.basepos >= ss.startbase) AND
(m.basepos <= ss.endbase))
Rows Removed by Join Filter: 1598476902
-> Seq Scan on marker m (cost=0.00..11647.63
rows=635363 width=24) (actual time=0.007..39.685 rows=635412 loops=1)
-> Hash (cost=40.39..40.39 rows=9 width=60)
(actual time=96.027..96.027 rows=30392 loops=1)
Buckets: 32768 (originally 1024)
Batches: 1 (originally 1) Memory Usage: 2432kB
-> CTE Scan on segset ss
(cost=0.00..40.39 rows=9 width=60) (actual time=0.571..92.319
rows=30392 loops=1)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
-> Index Scan using markerset_member_member_id_idx
on markerset_member mm (cost=0.42..0.52 rows=1 width=32) (actual
time=0.001..0.001 rows=1 loops=1783466)
Index Cond: (member_id = m.id)
Filter: (markerset_id =
'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
Rows Removed by Filter: 0
Planning time: 0.867 ms
Execution time: 154868.420 ms
(42 rows)

Browse pgsql-general by date

  From Date Subject
Next Message kbrannen 2017-04-27 18:40:31 Perl Dev 2017 survey results
Previous Message Andres Freund 2017-04-27 17:35:40 Re: Questionaire: Common WAL write rates on busy servers.