From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
Cc: | "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sorting Improvements for 8.4 |
Date: | 2007-12-20 02:45:49 |
Message-ID: | 87sl1y9j8i.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
>> Note that speeding up a query from 20s to 5s isn't terribly useful. If it's
>> OLTP you can't be using all your cores for each user anyways. And if it's
>> DSS 20s isn't a problem.
>
> Unless (of course) there are 20,000 users doing the queries that would take 20
> seconds but now they take 5 (when run single-user). They will still have a bit
> of a wait, of course.
I'm not exactly following. If you have 20,000 users then you're probably using
all the processors already. If you process them one by one on 4 cores in 5s
then you'll get the same throughput as if you ran them four at a time on 1
core each in 20s.
>> Where parallel processing like this becomes attractive is when you're
>> running a 2 hour query on a machine sequentially running scheduled batch
>> jobs which can be sped up to 30 minutes. But in that case you're almost
>> certainly being limited by your disk bandwidth, not your cpu speed.
>
> A linear speedup of 2 or more is always worth while[*]. Since sorting (e.g. for
> group by' and 'order by') and sort joins are a major database task, I guess
> that a linear speedup by a factor of 2 might make the database operations on
> the whole be 10% faster or so {OK, it's a SWAG}. I guess it would look good on
> the benchmarks, if nothing else.
Except note that you're not getting this linear speedup for free. To get a
linear speedup of 2x you'll be using more than 2x the cpu resources. If there
is nothing else contending for that resource (such as the scenario I described
where you're running a single large batch query on a system and want to use
all available resources to run it as fast as possible), then you'll get a 2x
speedup.
But if there is more than one query running on the system then you're not
actually gaining anything. Each query will run faster but you won't be able to
run as many simultaneously without having them slow back down. And the
overhead of parallelizing the query will be a net loss.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2007-12-20 02:49:11 | Re: Sorting Improvements for 8.4 |
Previous Message | Greg Smith | 2007-12-20 02:25:46 | Re: Sorting Improvements for 8.4 |