Just a note about column equivalence disarming the planner

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Just a note about column equivalence disarming the planner
Date: 2011-07-08 20:20:31
Message-ID: 4E17668F.1090603@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello folks,

This isn't really a problem, so much as an observation of just how much
the internals have changed over the years. We've got an older version
we're hoping to upgrade soon, and a developer asked me to optimize this
today:

SELECT order_id
FROM order
WHERE order_id = root_order_id
AND order_id IN (
SELECT DISTINCT m.root_order_id
FROM wacky_orders c
JOIN order m USING (root_order_id)
WHERE m.order_type = 'regular'
GROUP BY m.root_order_id, m.route_id
HAVING COUNT(1) > 1
);

From what I could tell, the query was fine. But this part of the
explain was confusing the hell out of me:

-> Seq Scan on order (cost=0.00..218943.98 rows=24092 width=20)
Filter: (order_id = root_order_id)

The thing is, that subquery there only produced 150 rows. So I shrugged
and simplified further by making a temp table, and got this:

SELECT order_id
FROM order m
JOIN zany_orders z ON (m.order_id = z.root_order_id)
WHERE m.order_id = m.root_order_id;

Which produced this:

Merge Join (cost=220705.29..220826.19 rows=1 width=10)
Merge Cond: (m.order_id = z.root_order_id)
-> Sort (cost=220697.42..220757.65 rows=24092 width=20)
Sort Key: m.order_id
-> Seq Scan on order m (cost=0.00..218943.98 rows=24092 width=20)
Filter: (order_id = root_order_id)
-> Sort (cost=7.87..8.24 rows=149 width=11)
Sort Key: z.root_order_id
-> Seq Scan on zany_orders z (cost=0.00..2.49 rows=149 width=11)

Ok, now it's just screwing with me. The order table has about 5M rows,
and this is clearly not a good idea, here. But then I took a closer
look. Why did it decide to filter based on a condition 90% of the table
fits, and then *merge* those results in with the 150-row temp table?

So, for giggles, I cast a column type to tell the planner it shouldn't
consider the columns equivalent:

SELECT master_order_id
FROM order m
JOIN zany_orders z ON (m.order_id = z.root_order_id)
WHERE m.order_id::VARCHAR = m.root_order_id;

And voila:

Nested Loop (cost=0.00..839.82 rows=1 width=8)
-> Seq Scan on zany_orders z (cost=0.00..2.49 rows=149 width=11)
-> Index Scan using order_pkey on order m (cost=0.00..5.60 rows=1
width=8)
Index Cond: (m.order_id = z.root_order_id)
Filter: ((order_id)::varchar = root_order_id)

I tried this with a mere 9.0 install and it wasn't having any of it. It
completely ignored the red-herring WHERE clause except as a post-filter.
I'm pretty sure that if it were possible to manifest itself to slap me
for even trying, it would have done so.

I've noticed lots of little things like this recently, and I have to
say, the planner has made huge improvements regardless of what
perception may reflect sometimes. It still has some holes and room for
improvement, but I just wanted to thank the devs for all their hard work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Browse pgsql-performance by date

  From Date Subject
Next Message Anish Kejariwal 2011-07-08 21:36:48 issue with query optimizer when joining two partitioned tables
Previous Message Greg Smith 2011-07-08 17:12:46 Re: "VACUUM FULL ANALYZE" vs. Autovacuum Contention