Quick Links

Re: queries with lots of UNIONed relations

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: queries with lots of UNIONed relations
Date:	2011-01-13 23:14:04
Message-ID:	AANLkTinubA5_CFAdERCyzDy1H3v409pyVTzdvPjTaPpr@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't believe there is any case where hashing each individual relation
>>> is a win compared to hashing them all together. If the optimizer were
>>> smart enough to be considering the situation as a whole, it would always
>>> do the latter.
>
>> You might be right, but I'm not sure. Suppose that there are 100
>> inheritance children, and each has 10,000 distinct values, but none of
>> them are common between the tables. In that situation, de-duplicating
>> each individual table requires a hash table that can hold 10,000
>> entries. But deduplicating everything at once requires a hash table
>> that can hold 1,000,000 entries.
>
>> Or am I all wet?
>
> If you have enough memory to de-dup them individually, you surely have
> enough to de-dup all at once. It is not possible for a single hashtable
> to have worse memory consumption than N hashtables followed by a union
> hashtable, and in fact if there are no common values then the latter eats
> twice as much space because every value appears twice in two different
> hashtables.

If everything were available up-front, sure.
However, and please correct me if I'm wrong, but doesn't postgresql
work in a fairly linear fashion, moving from table to table performing
a series of operations on each? That seems to indicate that is what
the plan is:

Compare:

for each table LOOP
scan table for result rows, append to results
END LOOP
hash / sort + unique results

versus:

for each table LOOP
scan table for result rows, append to table-results
hash / sort+unique table-results, append to results
END LOOP
hash / sort + unique results

In the former case, all of the result rows from all tables are
appended together before the de-duplification process can start.

In the latter case, only enough memory for each table's result set is
necessary for de-duplification, and it would only be necessary to
allocate it for that table.

Is that not how this works?

--
Jon

In response to

Re: queries with lots of UNIONed relations at 2011-01-13 23:05:04 from Tom Lane

Responses

Re: queries with lots of UNIONed relations at 2011-01-14 00:10:27 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2011-01-14 00:10:27	Re: queries with lots of UNIONed relations
Previous Message	Tom Lane	2011-01-13 23:07:25	Re: queries with lots of UNIONed relations