From: | Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | rm_pg(at)cheapcomplexdevices(dot)com, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Left Outer Join much faster than non-outer Join? |
Date: | 2005-03-31 07:04:53 |
Message-ID: | 424BA115.8050606@cheapcomplexdevices.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Tom Lane wrote:
> rm_pg(at)cheapcomplexdevices(dot)com writes:
>
>> select *
>> from streetname_lookup as sl
>> join city_lookup as cl on (true)
>> left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
>> where str_name='alamo' and city='san antonio' and state='TX'
>>;
>
> That's a fairly odd query;
I think it's a very common type of query in data warehousing.
It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.
> why don't you have any join condition between
> streetname_lookup and city_lookup?
Those two tables shared no data. They merely get the "id"s
for looking things up in the much larger central table.
Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.
With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?
streetname_lookup
(for every street name used in the country)
streetid | name | type
----------+--------+------
1 | Main | St
2 | 1st | St
city_lookup
(for every city name used in the country)
cityid | name | state
--------+---------+------
1 | Boston | MA
2 | Alameda| CA
tlid_smaller
(containing a record for every city block in the country)
city_id | street_id | addresses | demographics, etc.
--------+------------+-----------+----------------------
1 | 1 | 100 block | [lots of columns]
1 | 1 | 200 block | [lots of columns]
1 | 1 | 300 block | [lots of columns]
1 | 2 | 100 block | [lots of columns]
1 | 2 | 100 block | [lots of columns]
> The planner won't consider Cartesian joins unless forced to, which is
> why it fails to consider the join order "((sl join cl) join ts)" unless
> you have an outer join in the mix. I think that's generally a good
> heuristic, and am disinclined to remove it ...
IMHO it's a shame it doesn't even consider it when the estimated
results are very small. I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).
Ron
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Mayer | 2005-03-31 08:15:55 | Re: Left Outer Join much faster than non-outer Join? |
Previous Message | Tom Lane | 2005-03-31 04:07:52 | Re: Left Outer Join much faster than non-outer Join? |