From: | Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: queries with lots of UNIONed relations |
Date: | 2011-01-13 23:14:04 |
Message-ID: | AANLkTinubA5_CFAdERCyzDy1H3v409pyVTzdvPjTaPpr@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't believe there is any case where hashing each individual relation
>>> is a win compared to hashing them all together. If the optimizer were
>>> smart enough to be considering the situation as a whole, it would always
>>> do the latter.
>
>> You might be right, but I'm not sure. Suppose that there are 100
>> inheritance children, and each has 10,000 distinct values, but none of
>> them are common between the tables. In that situation, de-duplicating
>> each individual table requires a hash table that can hold 10,000
>> entries. But deduplicating everything at once requires a hash table
>> that can hold 1,000,000 entries.
>
>> Or am I all wet?
>
> If you have enough memory to de-dup them individually, you surely have
> enough to de-dup all at once. It is not possible for a single hashtable
> to have worse memory consumption than N hashtables followed by a union
> hashtable, and in fact if there are no common values then the latter eats
> twice as much space because every value appears twice in two different
> hashtables.
If everything were available up-front, sure.
However, and please correct me if I'm wrong, but doesn't postgresql
work in a fairly linear fashion, moving from table to table performing
a series of operations on each? That seems to indicate that is what
the plan is:
Compare:
for each table LOOP
scan table for result rows, append to results
END LOOP
hash / sort + unique results
versus:
for each table LOOP
scan table for result rows, append to table-results
hash / sort+unique table-results, append to results
END LOOP
hash / sort + unique results
In the former case, all of the result rows from all tables are
appended together before the de-duplification process can start.
In the latter case, only enough memory for each table's result set is
necessary for de-duplification, and it would only be necessary to
allocate it for that table.
Is that not how this works?
--
Jon
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-01-14 00:10:27 | Re: queries with lots of UNIONed relations |
Previous Message | Tom Lane | 2011-01-13 23:07:25 | Re: queries with lots of UNIONed relations |