Re: Tuning/performance issue...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Griffiths <dgriffiths(at)boats(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Tuning/performance issue...
Date: 2003-10-01 14:14:26
Message-ID: 28832.1065017666@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

David Griffiths <dgriffiths(at)boats(dot)com> writes:
>> The reason PG's planner doesn't discover this join order for itself
>> is that it's written to not attempt to re-order outer joins from the
>> syntactically defined ordering. In general, such reordering would
>> change the results. It is possible to analyze the query and prove that
>> certain reorderings are valid (don't change the results), but we don't
>> currently have code to do that.

> Not sure I follow. Are you saying that, depending on when the outer-join is
> applied to the rows found at the time, you may end up with a different set
> of rows?

Here's an example showing that it's not always safe to rearrange join
order in the presence of outer joins:

jtest=# create table a (f1 int);
CREATE TABLE
jtest=# create table b (f1 int, f2 int);
CREATE TABLE
jtest=# create table c(f1 int, f2 int);
CREATE TABLE
jtest=# insert into a values (1);
INSERT 431307 1
jtest=# insert into b values (10,10);
INSERT 431308 1
jtest=# insert into b values (11,11);
INSERT 431309 1
jtest=# insert into c values (1,10);
INSERT 431310 1
jtest=# insert into c values (2,11);
INSERT 431311 1

jtest=# SELECT * FROM a, b LEFT JOIN c ON b.f2 = c.f2 WHERE a.f1 = c.f1;
f1 | f1 | f2 | f1 | f2
----+----+----+----+----
1 | 10 | 10 | 1 | 10
(1 row)

Per spec the JOIN operator binds more tightly than comma, so this is
equivalent to:

jtest=# SELECT * FROM a JOIN (b LEFT JOIN c ON b.f2 = c.f2) ON a.f1 = c.f1;
f1 | f1 | f2 | f1 | f2
----+----+----+----+----
1 | 10 | 10 | 1 | 10
(1 row)

Now suppose we try to join A and C before joining to B:

jtest=# SELECT * FROM b LEFT JOIN (a join c ON a.f1 = c.f1) ON b.f2 = c.f2;
f1 | f2 | f1 | f1 | f2
----+----+----+----+----
10 | 10 | 1 | 1 | 10
11 | 11 | | |
(2 rows)

We get a different answer, because some C rows are eliminated before
reaching the left join, causing null-extended B rows to be added.

(I don't have a MySQL installation here to try, but if they still work
the way they used to, they get the wrong answer on the first query.)

The point of this example is just that there are cases where it'd be
incorrect for the planner to change the ordering of joins from what
is implied by the query syntax. It is always safe to change the join
order when only inner joins are involved. There are cases where outer
join order is safe to change too, but you need analysis code that checks
the query conditions to prove that a particular rearrangement is safe.
Right now, we don't have such code, and so we just follow the simple
rule "never rearrange any outer joins".

> I would have expected the optimizer to do the outer-joins last, as the
> extra data received by the outer-joins is not mandatory, and won't
> affect the rows that were retreived by joining user_account,
> address_list, and commercial_entity.

I think your example falls into the category of provably-safe
rearrangements ... but as I said, the planner doesn't know that.

> An outer join would *never* be the most restrictive
> join in a query.

Sure it can, if the restriction conditions are mainly on the outer
join's tables. But that's not really the issue here. As best I can
tell without seeing your data statistics, the most restrictive
conditions in your query are the ones on
commercial_entity.commercial_entity_id and user_account.user_role_id.
The trick is to apply those before joining any other tables.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jason Hihn 2003-10-01 15:13:15 Ideal Hardware?
Previous Message Andrew Sullivan 2003-10-01 13:25:36 Re: inferior SCSI performance