Sequential parallel scan / agg

From: James Sewell <james(dot)sewell(at)jirotech(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Sequential parallel scan / agg
Date: 2016-11-13 23:04:44
Message-ID: CAANVwEtjzthitYgkyOB_mQ=7G=-2EjdyfTb9XQL_3MycF-RkXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I have a customer who is making use of the new parallel features on a 64
core box - this is working really well for them. However one issue (which
I've since realised I've never seen an example of) is getting the data from
these queries into a table.

I can't seem to create a plan which does this and still uses multiple
workers, probably best to just show by example.

Below you can see:

1. *INSERT ... SELECT *doesn't work
2. *WITH a AS (SELECT ...) INSERT FROM* a doesn't work
3. *SELECT .. INTO* doesn't work
4. *SELECT* with no insert does work as expected

test=# explain
insert into jirotech.test
(select count(*) FROM meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2'
day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on test (cost=13336866.41..13336866.44 rows=1 width=438)
-> Subquery Scan on "*SELECT*" (cost=13336866.41..13336866.44 rows=1
width=438)
-> Aggregate (cost=13336866.41..13336866.42 rows=1 width=8)
-> Append (cost=0.00..12405302.12 rows=372625715 width=0)
-> Seq Scan on meter_read mr (cost=0.00..0.00 rows=1
width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_29 mr_1
(cost=0.00..6201295.24 rows=186240748 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_28 mr_2
(cost=0.00..6204006.88 rows=186384966 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
(10 rows)

test=# explain
with a as
(select count(*) FROM meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2'
day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
INSERT INTO jirotech.test SELECT * FROM A
;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on test (cost=13336866.42..13336866.45 rows=1 width=438)
CTE a
-> Aggregate (cost=13336866.41..13336866.42 rows=1 width=8)
-> Append (cost=0.00..12405302.12 rows=372625715 width=0)
-> Seq Scan on meter_read mr (cost=0.00..0.00 rows=1
width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_29 mr_1
(cost=0.00..6201295.24 rows=186240748 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_28 mr_2
(cost=0.00..6204006.88 rows=186384966 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> CTE Scan on a (cost=0.00..0.03 rows=1 width=438)
(11 rows)

test=# explain
select count(*) into jirotech.jamestest FROM meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2'
day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=13336866.41..13336866.42 rows=1 width=8)
-> Append (cost=0.00..12405302.12 rows=372625715 width=0)
-> Seq Scan on meter_read mr (cost=0.00..0.00 rows=1 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_29 mr_1 (cost=0.00..6201295.24
rows=186240748 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Seq Scan on meter_read_2016_03_28 mr_2 (cost=0.00..6204006.88
rows=186384966 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
(8 rows)

Time: 29.229 ms

test=# explain
select count(*) FROM meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2'
day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=7631566.71..7631566.72 rows=1 width=8)
-> Gather (cost=7631566.68..7631566.69 rows=8 width=8)
Workers Planned: 8
-> Partial Aggregate (cost=7630566.68..7630566.69 rows=1 width=8)
-> Append (cost=0.00..7514121.14 rows=46578216 width=0)
-> Parallel Seq Scan on meter_read mr
(cost=0.00..0.00 rows=1 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Parallel Seq Scan on meter_read_2016_03_29 mr_1
(cost=0.00..3756417.28 rows=23280094 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
-> Parallel Seq Scan on meter_read_2016_03_28 mr_2
(cost=0.00..3757703.86 rows=23298121 width=0)
Filter: ((timestamp_rounded >= '2016-03-28
00:00:00'::timestamp without time zone) AND (timestamp_rounded <
'2016-03-30 00:00:00'::timestamp without time zone))
(11 rows)

Time: 29.819 ms

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000> *W* www.jirotech.com *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

--

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message melvin6925 2016-11-13 23:43:49 Re: Sequential parallel scan / agg
Previous Message Jeff Janes 2016-11-13 22:57:15 Re: Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists