Quick Links

parallel query evaluation

From:	Oliver Seidel <postgresql(at)os10000(dot)net>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	parallel query evaluation
Date:	2012-11-08 11:55:12
Message-ID:	214a89820818a3da44bcadb4ab829c14@os10000.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

I have

create table x ( att bigint, val bigint, hash varchar(30)
);

with 693million rows. The query

create table y as select att, val, count(*) as cnt from x
group by att, val;

ran for more than 2000 minutes and used 14g memory on an 8g physical
RAM machine -- eventually I stopped it. Doing

create table y ( att bigint, val bigint, cnt int );
and something a bit like: for i in `seq 0 255` | xargs -n 1
-P 6
psql -c "insert into y select att, val,
count(*) from x where att%256=$1 group by att, val" test

runs 6 out of 256 in 10 minutes -- meaning the whole problem can be
done in just under 3 hours.

Question 1: do you see any reason why the second method would yield a
different result from the first method?
Question 2: is that method generalisabl so that it could be included in
the base system without manual shell glue?

Thanks,

Oliver

Responses

Re: parallel query evaluation at 2012-11-10 15:32:25 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Denis	2012-11-08 13:29:40	Re: Thousands databases or schemas
Previous Message	Samuel Gendler	2012-11-08 10:31:52	Re: Thousands databases or schemas