From: | David Griffiths <dgriffiths(at)boats(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Tuning/performance issue... |
Date: | 2003-10-01 05:48:54 |
Message-ID: | 074601c387df$b5939270$6501a8c0@griffiths2 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> The most efficient way to handle this query would probably be to join
> the three tables with restrictions first, and then join the other tables
> to those. You could force this with not too much rewriting using
> something like (untested, but I think it's right)
>
> ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> CROSS JOIN country
> WHERE ...
>
> The explicit JOINs associate left-to-right, so this gives the intended
> join order. (In your original query, explicit JOIN binds more tightly
> than commas do.)
Ok - that's interesting - I'll have to do some reading and more testing.
> The reason PG's planner doesn't discover this join order for itself
> is that it's written to not attempt to re-order outer joins from the
> syntactically defined ordering. In general, such reordering would
> change the results. It is possible to analyze the query and prove that
> certain reorderings are valid (don't change the results), but we don't
> currently have code to do that.
Not sure I follow. Are you saying that, depending on when the outer-join is
applied to the rows found at the time, you may end up with a different set
of rows? I would have expected the optimizer to do the outer-joins last, as
the extra data received by the outer-joins is not mandatory, and won't
affect
the rows that were retreived by joining user_account, address_list, and
commercial_entity.
An outer join would *never* be the most restrictive
join in a query. I thought (from my readings on Oracle query tuning) that
finding the most restrictive table/index was the first task of an optimizer.
Reduce the result set as quickly as possible. That query has the line,
"AND commercial_entity.commercial_entity_id=225528",
which uses an index (primary key) and uses an "=". I would have expected
that to be done first, then joined with the other inner-join tables, and
finally
have the outer-joins applied to the final result set to fill in the "might
be there" data.
Anyway, if the optimizer does the outer-joins first (address_list with
state_province
and contact_info), then it's picking the table with the most rows
(address_list has
200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering
the FROM clause (and LEFT JOIN portions) help?
Could you give an example where applying an outer-join at a different time
could
result in different results? I think I can see at situation where you use
part of the results
in the outer-join in the where clause, but I am not sure.
> I'm prepared to believe that Oracle contains code that actually does the
> analysis about which outer-join reorderings are valid, and is then able
> to find the right join order by deduction.
I'm not sure about Oracle (other than what I stated above). In fact, about
half
the time, updating table stats to try to get the Oracle optimizer to do a
better
job on a query results in even worse performance.
> ... FROM ((((commercial_entity CROSS JOIN country) CROSS JOIN
> user_account) CROSS JOIN address_list)
> LEFT JOIN state_province ON ...)
> LEFT JOIN contact_info ON ...
> WHERE ...
>
> This is clearly at odds with the SQL spec's syntactically defined join
> order semantics. It's possible that it always yields the same results
> as the spec requires, but I'm not at all sure about that.
Again, I don't know. On the 3 queries based on these tables, Postgres
and MySQL return the exact same data (they use the same data set).
Do you have a link to the SQL spec's join-order requirements?
> In any case
> this strategy is certainly not "better" than ours, it just performs
> poorly on a different set of queries. Would I be out of line to
> speculate that your query was previously tuned to work well in MySQL?
The query was pulled from our codebase (written for Oracle). I added a bit
to it
to make it slower, and then ported to MySQL and tested there first (just
re-wrote
the outer-join syntax). I found that re-ordering the tables in the
from-clause on
MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's
because I had forgotten to re-analyze the tables after refreshing the
dataset.
Now, table order doesn't make a difference in speed (or results).
If anything, I've done more tuning for Postgres - added some extra indexes
to try to help
(country.country_id had a composite index with another column, but not an
index for
just it), etc.
The dataset and schema is pure-Oracle. I extracted it out of the database,
removed all
Oracle-specific extensions, changed the column types, and migrated the
indexes and
foreign keys to MySQL and Postgres. Nothing more (other than an extra index
or two for Postgres - nada for MySQL).
This is all part of a "migrate away from Oracle" project. We are looking at
3 databases -
MySQL (InnoDB), Postgres and Matisse (object oriented). We have alot of
queries like this
or worse, and I'm worried that many of them would need to be re-written. The
developers
know SQL, but nothing about tuning, etc.
Thanks for the quick response - I will try explicit joining, and I'm looking
forward to
your comments on outer-joins and the optmizer (and anything else I've
written).
David.
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2003-10-01 07:05:30 | Re: Tuning/performance issue.... |
Previous Message | Tom Lane | 2003-10-01 04:28:50 | Re: Tuning/performance issue... |