From: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
---|---|
To: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash support for grouping sets |
Date: | 2017-03-23 20:07:38 |
Message-ID: | 87tw6j94y7.fsf@news-spur.riddles.org.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>>>> "Mark" == Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
Mark> Is there a performance test case where this patch should shine
Mark> brightest? I'd like to load a schema with lots of data, and run
Mark> a grouping sets query, both before and after applying the patch,
Mark> to see what the performance advantage is.
The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.
So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):
create table sales (
id serial,
product_id integer,
store_id integer,
customer_id integer,
qty integer);
-- random integer function
create function d(integer) returns integer language sql
as $f$ select floor(random()*$1)::integer + 1; $f$;
-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
select d(20), d(6), d(10), d(100) from generate_series(1,10000000);
-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
from sales
group by cube(product_id, store_id);
-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
from sales
group by cube(product_id, store_id, customer_id);
-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
from sales
group by cube(product_id, store_id, customer_id, (qty / 10));
On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).
--
Andrew (irc:RhodiumToad)
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2017-03-23 20:08:52 | Re: Potential data loss of 2PC files |
Previous Message | Mithun Cy | 2017-03-23 19:52:34 | Re: [POC] A better way to expand hash indexes. |