Quick Links

Re: Thinking about IN/EXISTS optimization

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Thinking about IN/EXISTS optimization
Date:	2002-10-23 02:38:31
Message-ID:	200210230238.g9N2cVe07179@candle.pha.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

This sounds like one of those classic optimizer problems we have had to
deal with in the past. I suggest you go through the optimizer pass and
set a boolean in Query whenever you do something that may require
another loop through, then at the end, you check the boolean and loop if
required.

I think the rules system has to do something similar. I don't see any
way around that, but because you are setting the boolean you only loop
when you need to.

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been thinking about how to convert "x IN (subselect)" and EXISTS
> constructs into join-like processing, and I've run into a small problem
> in getting the planner to do it nicely. The issue is that I need to
> take the subselect and push it into the jointree --- essentially, make
> it look like a subselect-in-FROM --- so that the join planner can deal
> with it. Basically, I need to rearrange
>
> SELECT ... FROM ... WHERE ... AND x IN (SELECT y FROM ...)
>
> into
>
> SELECT ... FROM ..., (SELECT y FROM ...) ss
> WHERE ... AND x =* ss.y
>
> where =* represents some specially-marked RestrictInfo node. (NOT IN is the
> same except that the RestrictInfo node will be marked differently.)
>
> The difficulty is that there's no good place to do this in
> subquery_planner(). We should push the subselect into FROM before we
> run the pull_up_subqueries() and preprocess_jointree() operations;
> if we don't pull up the subselect into the main query then we won't have
> accomplished very much. But the WHERE clause isn't simplified into a
> form that makes it easy to spot top-level IN() expressions until after
> that. We can't simply switch the order of the subselect and
> WHERE-clause processing, because pulling up subqueries typically adds
> conditions to the WHERE clause.
>
> I haven't been able to think of a solution to this that doesn't involve
> wasting a lot of cycles by repeating some of these processing steps,
> or missing some optimization possibilities. (For example, if we pull up
> a subquery that came from a view, it might contain an IN where-clause,
> which ideally we'd want to be able to optimize. It almost seems like
> we need to be able to loop around the whole operation; but most of the
> time this will just waste cycles.)
>
> Anyone see a nice way to do this?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Thinking about IN/EXISTS optimization at 2002-10-22 23:18:11 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Philip Warner	2002-10-23 02:39:23	Re: pg_dump and large files - is this a problem?
Previous Message	Bruce Momjian	2002-10-23 01:16:17	Re: One 7.3 item left