Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables

From: Mark Hampton <mark(at)cleverdba(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables
Date: 2013-04-30 17:13:22
Message-ID: CALqOgZ0t8=V06JXFHDnhxNfzfBpXkK8gZE5wFs3GtQbK0zZyqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a Hibernate-generated query (That's not going to change, so let's
just focus on the Postgres side for now) like this:

SELECT *
from PERSON p
where p.PERSON_ID in (
select distinct p2.PERSON_ID
from PERSON p2
left outer join PERSON_ALIAS pa on
p2.PERSON_ID = pa.PERSON_ID
where (lower(p1.SURNAME) = 'duck' or
lower(pa.SURNAME) = 'duck') and
(lower(p1.FORENAME) = 'donald' or
lower(pa.FORENAME) = 'donald')
)
order by p.PERSON_ID asc;

There are function-based indexes on PERSON and PERSON_ALIAS as follows:

CREATE INDEX PERSON_FORENAME_LOWER_FBIDX ON PERSON (LOWER(FORENAME) VARCHAR
_PATTERN_OPS);
CREATE INDEX PERSON_SURNAME_LOWER_FBIDX ON PERSON (LOWER(SURNAME) VARCHAR
_PATTERN_OPS);
CREATE INDEX PERSON_ALIAS_FORENAME_LOWER_FBIDX ON PERSON_ALIAS
(LOWER(FORENAME) VARCHAR_PATTERN_OPS);
CREATE INDEX PERSON_ALIAS_SURNAME_LOWER_FBIDX ON PERSON_ALIAS
(LOWER(SURNAME) VARCHAR_PATTERN_OPS);

The problem is that the above query doesn't use the indexes. The "or"
clauses across the outer-join seem to be the culprit. If I rewrite the
query as follows, Postgres will use the index:

SELECT *
from PERSON p
where (p.PERSON_ID in (
select p2.PERSON_ID
from TRAVELER.PERSON p2
join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
pa.PERSON_ID
where lower(p2.SURNAME) = 'duck' and
lower(pa.FORENAME) = 'donald'
) or
p.PERSON_ID in (
select p2.PERSON_ID
from TRAVELER.PERSON p2
join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
pa.PERSON_ID
where lower(pa.SURNAME) = 'duck' and
lower(p2.FORENAME) = 'donald'
) or
p.PERSON_ID in (
select p2.PERSON_ID
from TRAVELER.PERSON p2
where lower(p2.SURNAME) = 'duck' and
lower(p2.FORENAME) = 'donald'
) or
p.PERSON_ID in (
select p2.PERSON_ID
from TRAVELER.OTHER_NAME pa
where lower(pa.SURNAME) = 'duck' and
lower(pa.FORENAME) = 'donald'
))
order by p.PERSON_ID asc;

So my question is this: Is there a way to get the Postgres optimizer
"rewrite" the query execution plan to use the equivalent, but much more
efficient latter form?

And before you ask; yes, there are better ways of writing this query. But
we're dealing with Java developers and Hibernate here. It's a legacy
system, and the policy is to avoid hand-written SQL, so for the moment
let's not go down that rabbit hole, and focus on the issue of what the
optimizer can and cannot do.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2013-04-30 19:24:43 Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables
Previous Message Christoph Berg 2013-04-30 11:20:55 RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1