queries with lots of UNIONed relations

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: queries with lots of UNIONed relations
Date: 2011-01-13 15:55:31
Message-ID: AANLkTiktjfaR9_sDU+F3UZhAQn7qrBktXE1EiOX=inzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I was recently asked to look into why one particular set of queries
was taking a long time. The queries are all of the same form. They
select the UNION of a few
columns from around 100 tables.

The query in particular was taking some 7-8 minutes to run.

On a whim, I changed the query from this form:

SELECT a, b FROM FOO_a WHERE <conditions>
UNION
SELECT a,b FROM FOO_b WHERE <conditions>
....

to:

SELECT DISTINCT a,b FROM FOO_a WHERE <conditions>
UNION
SELECT DISTINCT a,b FROM FOO_b WHERE <conditions>
...

and the query time dropped to under a minute.

In the former case, the query plan was a bitmap heap scan for each
table. Then those results were Appended, Sorted, Uniqued, Sorted
again, and then returned.

In the latter, before Appending, each table's results were run through
HashAggregate.

The total number of result rows is in the 500K range. Each table holds
approximately 150K matching rows (but this can vary a bit).

What I'm asking is this: since adding DISTINCT to each participating
member of the UNION query reduced the total number of appended rows,
is there some sort of heuristic that postgresql could use to do this
automatically? The 12x speedup was quite nice.

--
Jon

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-01-13 17:13:22 Re: queries with lots of UNIONed relations
Previous Message Jon Nelson 2011-01-13 15:51:29 queries with lots of UNIONed relations