From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Huge Data sets, simple queries |
Date: | 2006-01-28 17:37:00 |
Message-ID: | 18415.1138469820@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> On Sat, 2006-01-28 at 10:55 -0500, Tom Lane wrote:
>> Assuming that "month" means what it sounds like, the above would result
>> in running twelve parallel sort/uniq operations, one for each month
>> grouping, to eliminate duplicates before counting. You've got sortmem
>> set high enough to blow out RAM in that scenario ...
> Hrmm, why is it that with a similar query I get a far simpler plan than
> you describe, and relatively snappy runtime?
You can't see the sort operations in the plan, because they're invoked
implicitly by the GroupAggregate node. But they're there.
Also, a plan involving GroupAggregate is going to run the "distinct"
sorts sequentially, because it's dealing with only one grouping value at
a time. In the original case, the planner probably realizes there are
only 12 groups and therefore prefers a HashAggregate, which will try
to run all the sorts in parallel. Your "group by date" isn't a good
approximation of the original conditions because there will be a lot
more groups.
(We might need to tweak the planner to discourage selecting
HashAggregate in the presence of DISTINCT aggregates --- I don't
remember whether it accounts for the sortmem usage in deciding
whether the hash will fit in memory or not ...)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-01-28 18:55:08 | Re: Huge Data sets, simple queries |
Previous Message | Jeffrey W. Baker | 2006-01-28 17:08:53 | Re: Huge Data sets, simple queries |