From: | Mark Mielke <mark(at)mark(dot)mielke(dot)cc> |
---|---|
To: | Matthew Wakeling <matthew(at)flymine(dot)org> |
Cc: | Pgsql performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Group by more efficient than distinct? |
Date: | 2008-04-22 13:04:30 |
Message-ID: | 480DE25E.4080507@mark.mielke.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Matthew Wakeling wrote:
> On Tue, 22 Apr 2008, Mark Mielke wrote:
>> The poster I responded to said that the memory required for a hash
>> join was relative to the number of distinct values, not the number of
>> rows. They gave an example of millions of rows, but only a few
>> distinct values. Above, you agree with me that it it would include
>> the rows (or at least references to the rows) as well. If it stores
>> rows, or references to rows, then memory *is* relative to the number
>> of rows, and millions of records would require millions of rows (or
>> row references).
>
> Yeah, I think we're talking at cross-purposes, due to hash tables
> being used in two completely different places in Postgres. Firstly,
> you have hash joins, where Postgres loads the references to the actual
> rows, and puts those in the hash table. For that situation, you want a
> small number of rows. Secondly, you have hash aggregates, where
> Postgres stores an entry for each "group" in the hash table, and does
> not store the actual rows. For that situation, you can have a
> bazillion individual rows, but only a small number of distinct groups.
That makes sense with my reality. :-)
Thanks,
mark
--
Mark Mielke <mark(at)mielke(dot)cc>
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Spreng | 2008-04-22 13:42:25 | Re: Oddly slow queries |
Previous Message | Matthew Wakeling | 2008-04-22 12:22:20 | Re: Group by more efficient than distinct? |