GroupAggregate and Integer Arrays

From: David Osborne <david(at)qcode(dot)co(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: GroupAggregate and Integer Arrays
Date: 2015-10-23 14:29:17
Message-ID: CAKmpXCcjcni8XwUP2HPPpTk91FmTSk-jtoHkwXrC-mcYM+K4_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

Wondering if anyone could suggest how we could improve the performance of
this type of query?
The intensive part is the summing of integer arrays as far as I can see.
We're thinking there's not much we can do to improve performance apart from
throw more CPU at it... would love to be proven wrong though!

*Query:*

explain (analyse,buffers)
select
sum(s2.array_a),sum(s2.array_b)
from mytable s1 left join mytable s2
on s1.code=s2.code and s1.buyer=s2.seller and s2.seller='XX'
where s1.buyer='XX'
group by s1.buyer,s1.code
;

*Depesz Explain Link:*

http://explain.depesz.com/s/m3XP

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=275573.49..336223.36 rows=2547 width=524) (actual
time=1059.340..22946.772 rows=22730 loops=1)
Buffers: shared hit=113596 read=1020 dirtied=15
-> Merge Left Join (cost=275573.49..278850.09 rows=113560 width=524)
(actual time=1058.773..1728.186 rows=240979 loops=1)
Merge Cond: ((s1.code)::text = (s2.code)::text)
Join Filter: (s1.buyer = (s2.seller)::bpchar)
Buffers: shared hit=113596 read=1020 dirtied=15
-> Index Only Scan using mytable_buyer_idx on mytable s1
(cost=0.42..1226.06 rows=25465 width=12) (actual time=0.015..35.790
rows=22730 loops=1)
Index Cond: (buyer = 'XX'::bpchar)
Heap Fetches: 3739
Buffers: shared hit=16805 dirtied=1
-> Sort (cost=275573.07..275818.33 rows=98106 width=525) (actual
time=1058.736..1141.560 rows=231662 loops=1)
Sort Key: s2.code
Sort Method: quicksort Memory: 241426kB
Buffers: shared hit=96791 read=1020 dirtied=14
-> Bitmap Heap Scan on mytable s2
(cost=12256.28..267439.07 rows=98106 width=525) (actual
time=60.330..325.730 rows=231662 loops=1)
Recheck Cond: ((seller)::text = 'XX'::text)
Filter: ((seller)::bpchar = 'XX'::bpchar)
Buffers: shared hit=96791 read=1020 dirtied=14
-> Bitmap Index Scan on mytable_seller_idx
(cost=0.00..12231.75 rows=254844 width=0) (actual time=40.474..40.474
rows=233244 loops=1)
Index Cond: ((seller)::text = 'XX'::text)
Buffers: shared hit=30 read=1020
Total runtime: 22968.292 ms
(22 rows)

*Table size:*

=> select count(*) from mytable;
count
--------
602669
(1 row)

*Array types:*

# select array_a,array_b from mytable limit 1;
array_a | array_b
---------------------------+---------------------------
{0,0,0,0,0,0,0,0,0,0,0,0} | {0,0,0,0,0,0,0,0,0,0,0,0}

*Example schema:*

# \d mytable
Table "public.mytable"
Column | Type | Modifiers
-------------------+-----------------------+------------------------
buyer | character(2) | not null
code | character varying(20) | not null
seller | character varying(50) |
array_a | integer[] |
array_b | integer[] |
Indexes:
"mytable_buyer_code_idx" UNIQUE, btree (buyer, code) CLUSTER
"mytable_buyer_idx" btree (buyer)
"mytable_code_idx" btree (code)
"mytable_seller_idx" btree (seller)

*Version:*

> SELECT version() ;
version

--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3
20120306 (Red Hat 4.6.3-2), 64-bit
(1 row)

This is running on an AWS RDS instance.

Thanks for any pointers
--
David

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2015-10-23 16:15:42 Re: GroupAggregate and Integer Arrays
Previous Message vincent elschot 2015-10-22 14:48:14 Re: Recursive query performance issue