Re: omitting redundant join predicate

From: Ehab Galal <ehabgalal123(at)hotmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: omitting redundant join predicate
Date: 2007-11-05 15:15:36
Message-ID: BAY138-W453548E0FFC1D11A00D2AD96880@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Sorry for not being clear enough. What i meant is how aware the optimizer is about the transitivity of operators.

I agree that the more join clauses a query gets, the more flexibility the optimizer gets to pick an optimal plan.

what i expected is that the optimizer will use the redundant predicates
to create the plan, but the execution plan itself will not execute a
redundant predicate.

O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality and it is working well :)

Nested Loop (cost=0.00..3.14 rows=1 width=368)

Join Filter: ("outer".username = "inner".username)

-> Nested Loop (cost=0.00..2.10 rows=1 width=218)

Join Filter: ("outer".username = "inner".username)

-> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=146)

-> Seq Scan on t3 (cost=0.00..1.04 rows=4 width=72)

-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=150)

I am using postgresql 8.5.1, I am wondering is there is any patch that
i can run to enable it to put the actual table names instead of
inner/outer. Should i post this to the hackers mailing list?

Thanks a lot.

> To: ehabgalal123(at)hotmail(dot)com
> CC: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] omitting redundant join predicate
> Date: Sun, 4 Nov 2007 11:35:36 -0500
> From: tgl(at)sss(dot)pgh(dot)pa(dot)us
>
> Ehab Galal <ehabgalal123(at)hotmail(dot)com> writes:
> > explain select *
> > from t1, t2, t3
> > where t1.f <= t2.f
> > and t2.f <= t3.f
> > and t1.f <= t3.f;
>
> > I was wondering if there is a
> > way to omit the redundant join predicate.
>
> You're not being very clear here. Do you mean will you get the same
> answer if you omit "t1.f <= t3.f"? Yes, of course (ignoring possibly
> different output ordering). Do you mean you think the system should
> discard it as redundant? I disagree --- the more join clauses the
> better, as a rule. Do you mean that the EXPLAIN output looks like
> the same comparison is being applied twice? It isn't --- in a more
> modern PG release the output looks like this:
>
> QUERY PLAN
> ------------------------------------------------------------------
> Nested Loop (cost=33.54..81794021.44 rows=362975624 width=12)
> Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))
> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)
> Join Filter: (t1.f <= t3.f)
> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)
> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)
> -> Materialize (cost=33.54..54.94 rows=2140 width=4)
> -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=4)
> (8 rows)
>
> This is of course the stupidest possible join plan, but it's hard to do
> much better --- both hash and merge joins work only on equality
> conditions. You can do a bit better with an index on t2.f:
>
> QUERY PLAN
> ----------------------------------------------------------------------
> Nested Loop (cost=0.00..13222230.60 rows=362975624 width=12)
> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)
> Join Filter: (t1.f <= t3.f)
> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)
> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)
> -> Index Scan using t2i on t2 (cost=0.00..5.01 rows=238 width=4)
> Index Cond: ((t1.f <= t2.f) AND (t2.f <= t3.f))
> (7 rows)
>
> regards, tom lane

_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Gregory Stark 2007-11-05 15:27:03 Re: Returning the total number of rows as a separate column when using limit
Previous Message Tom Lane 2007-11-05 15:00:53 Re: Returning the total number of rows as a separate column when using limit