Quick Links

Re: queries with lots of UNIONed relations

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andy Colson <andy(at)squeakycode(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: queries with lots of UNIONed relations
Date:	2011-01-13 22:53:22
Message-ID:	AANLkTikRTCXdvDVvQXV0ohy9aruncwL=qgcDYyk=VLPX@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Jan 13, 2011 at 4:49 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 13, 2011 at 5:47 PM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
>>>>> I don't believe there is any case where hashing each individual relation
>>>>> is a win compared to hashing them all together. If the optimizer were
>>>>> smart enough to be considering the situation as a whole, it would always
>>>>> do the latter.
>>>>
>>>> You might be right, but I'm not sure. Suppose that there are 100
>>>> inheritance children, and each has 10,000 distinct values, but none of
>>>> them are common between the tables. In that situation, de-duplicating
>>>> each individual table requires a hash table that can hold 10,000
>>>> entries. But deduplicating everything at once requires a hash table
>>>> that can hold 1,000,000 entries.
>>>>
>>>> Or am I all wet?
>>>
>>> Yeah, I'm all wet, because you'd still have to re-de-duplicate at the
>>> end. But then why did the OP get a speedup? *scratches head*
>>
>> Because it all fix it memory and didnt swap to disk?
>
> Doesn't make sense. The re-de-duplication at the end should use the
> same amount of memory regardless of whether the individual relations
> have already been de-duplicated.

I don't believe that to be true.
Assume 100 tables each of which produces 10,000 rows from this query.
Furthermore, let's assume that there are 3,000 duplicates per table.

Without DISTINCT:
uniqify (100 * 10,000 = 1,000,000 rows)

With DISTINCT:
uniqify (100 * (10,000 - 3,000) = 700,000 rows)

300,000 rows times (say, 64 bytes/row) = 18.75MB.
Not a lot, but more than the work_mem of 16MB.

Or maybe *I'm* all wet?

--
Jon

In response to

Re: queries with lots of UNIONed relations at 2011-01-13 22:49:15 from Robert Haas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2011-01-13 23:05:04	Re: queries with lots of UNIONed relations
Previous Message	Andy Colson	2011-01-13 22:52:10	Re: queries with lots of UNIONed relations