From: | "Looby, Denis" <denis(dot)looby(at)hp(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Query plan different depending on the value of where filter |
Date: | 2013-07-17 16:08:51 |
Message-ID: | 7C0E051ADD376F44BF4B660E9EE98B86881C4269@G9W0710.americas.hpqcorp.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi All,
I have an interesting query scenario I'm trying to understand.
I came across this while investigating a slow query in our application.
I've been able to reproduce the scenario in a psql script that sets up the tables and runs the queries.
Script here http://pastebin.com/CBkdDmWp if anyone is interested.
This is the scenario.
Version is "PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 32-bit"
Three tables
Outer_tab : 5000 rows
Inner_tab_1 : 1000 rows
Inner_tab_2 : 16000 rows
This is the query
SELECT outer_tab.outer_key
FROM outer_tab
WHERE outer_tab.outer_key IN (
SELECT inner_tab_1.key_to
FROM inner_tab_2
INNER JOIN inner_tab_1 ON (inner_tab_2.outer_key = inner_tab_1.key_from AND inner_tab_1.type = 2)
WHERE outer_tab.outer_key = inner_tab_1.key_to AND inner_tab_2.group_id = 9
);
Two important things here, type = 2 does not occur in inner_tab_1 and group_id = 9 does not occur in inner_tab_2 and group_id is not indexed.
The result is 0 rows.
Now this is quite slow about 15 seconds on my machine.
Here is the explain plan http://explain.depesz.com/s/BVg
I understand that the seq scan on inner_tab_2 and its 16000 rows is the culprit and the easy fix is to swap inner_tab_2 and inner_tab_1 between the FROM and the JOIN.
This lets it drive off inner_tab_1 with an index scan and skip the sequential scan as seen here http://explain.depesz.com/s/pkG
Much better at 14ms.
That's fine but what has me somewhat confused is if group_id in the WHERE is changed to 1, which does exist in inner_tab_2, we get quite a different plan.
http://explain.depesz.com/s/FX4
It's quick too 63ms
What I don't understand is why the plan is different just because the group_id = has changed value?
Does the planner have some statistical info on the contents of non-indexed rows?
I don't quite understand why this plan executes the sequential scan once, whereas the slow one does it 5001 times, which I believe is the main source of the difference.
Also if I don't ANALYZE the tables the original query will run in a few ms instead of 15 seconds, it actually uses the same query plan that swapping the tables creates.
So it runs the index scan on inner_tab_1 first.
It's a bit surprising that with ANALYSE it picks a plan that is so much worse.
Any one able to shed some light?
Thanks for your time,
Denis Looby
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-07-17 17:04:09 | Re: Query plan different depending on the value of where filter |
Previous Message | Muhammad Bashir Al-Noimi | 2013-07-17 14:13:49 | Re: pgAdmin for ubuntu |