PlaceHolderVars versus join ordering

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: PlaceHolderVars versus join ordering
Date: 2010-09-27 17:15:40
Message-ID: 16419.1285607740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been looking at the bug reported by Kirill Simonov here:
http://archives.postgresql.org/pgsql-bugs/2010-09/msg00265.php
This is pretty ugly :-(. What we have is
select * from A
left join (select some-expressions from B left join C ...)
The expressions are not guaranteed nullable; that is, they won't
automatically produce nulls from all-null rows for B and C.
This means that we have to evaluate the expressions at the join of
B and C, producing PlaceHolderVars, which then bubble up through
the left join with A, and will get replaced with nulls in any
null-extended row of that join. Then references to them in the final
select list will correctly show as nulls.

The problem is that there isn't anything guaranteeing that the B/C join
will be formed before joining to A. In Kirill's example the planner
thinks it should commute the two joins, and then it evaluates the
PlaceHolderVars at the now-upper B/C join, meaning they fail to go to
nulls in rows where they should.

This is a fundamental oversight in the original design of the
PlaceHolderVar mechanism. I'm not sure how come I didn't see that some
sort of join-order interlock was necessary, but I didn't.

The most trustworthy fix that I can think of goes like this:

1. Add an extra pass over the parsetree early in query_planner() to
locate PlaceHolderVars, set up the PlaceHolderInfo list, and remember the
syntactic level of each actually-used PlaceHolderVar. It's slightly
annoying to have to make an extra tree traversal for this, but at least
we can skip it in the common case where no PlaceHolderVars were created.
(Currently, this processing happens as a side-effect during
distribute_qual_to_rels, but that's too late for what we need to do in
point 2. Also, making the PlaceHolderInfos immediately when the
PlaceHolderVars are created would be unpalatable since we don't know at
that stage whether the PlaceHolderVars are actually used above the
relevant outer join.)

2. In make_outerjoininfo, scan the PlaceHolderInfo list to see if there
are any placeholders that should be evaluated below the current outer join
and are needed above it. If so, enlarge the outer join's min_righthand
set to ensure that it won't get moved below where the placeholder has to
be evaluated. We can fold the existing fix_placeholder_eval_levels()
logic into this operation, which will make that code less fragile than it
is now.

This is a larger change than I would prefer to back-patch, but the only
less-invasive alternative I can see is to lobotomize the PlaceHolderVar
mechanism entirely by reverting to 8.3-style logic wherein we prevented
pullup of sub-selects that would require introduction of placeholders.
That would undo a significant optimization feature of 8.4, one that
I believe we're now relying on for reasonable performance of some system
views.

Thoughts, better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-09-27 17:21:39 Re: Improving prep_buildtree used in VPATH builds
Previous Message Robert Haas 2010-09-27 17:08:45 Re: BUG #5650: Postgres service showing as stopped when in fact it is running