Searching for the cause of a bad plan

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Searching for the cause of a bad plan
Date: 2007-09-21 10:03:44
Message-ID: 1190369024.4661.173.camel@PCD12478
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

Postgres version: 8.2.4

Tables:

table_a(a bigint, b bigint, primary key(a, b) );

table_b1(b bigint primary key, more columns...);

table_b2(b bigint primary key references table_b1(b), more columns...);

table_b1:
~ 27M rows;
~25 more columns;
width=309 (as reported by explain select *);

table_a:
~400M rows;
- column "b" should reference table_b1, but it does not for performance
reasons (it is an insert only table);
- column "a" distinct values: 1148
- has (a, b) as primary key;
- has no additional columns;

table_b1:
~40K rows;
~70 more columns;
width=1788 (as reported by explain select *);

Statistics for the involved columns for each table are attached in files
(to preserve the spacing). They were taken after analyzing the relevant
table (except for table_b2 where I added the "fiddled" statistics first
and then remembered to analyze fresh, resulting in the "non_fiddled"
version, which gives the same result as the fiddled one).

The problem query is:

prepare test_001(bigint) as
SELECT tb.*
FROM table_a ta
JOIN table_b2 tb ON ta.b=tb.b
WHERE ta.a = $1
ORDER BY ta.a, ta.b
limit 10;

Explain gives Plan 1 (see attached plans.txt)

If I set enable_hashjoin=off and enable_mergejoin=off, I get Plan 2
(again, see plans.txt).

The difference is a 30x improvement in the second case...
(I actually forgot to account for cache effects, but later rerun the
queries multiple times and the timings are proportional).

Additionally, if I replace table_b2 with table_b1 in the query, I get
Plan 3 (with reasonable execution time) with both enable_hashjoin and
enable_mergejoin on. So there is something which makes table_b2
different from table_b1 for planning purposes, but I could not identify
what that is... they have differences in statistics, but fiddling with
the stats gave me no difference in the plan.

Looking at Plan 2, it looks like the "limit" step is estimating wrongly
it's cost. I guessed that it does that because it thinks the "b" values
selected from table_a for a given "a" span a larger range than the "b"
values in table_b2, because the "b" values in table_b2 are a (relatively
small) subset of the "b" values in table_a. But this is not the case,
the query only gets "a" values for which all the "b" values in table_a
will be found in table_b2. Of course the planner has no way to know
this, but then I think it is not the case, as I tried to copy the
histogram statistics in pg_statistic for the column "b" from the entry
for table_b1 (which contains the whole span of "b" values) to the entry
for table_b2, with no change in the plan.

Just for the record, this query is just a part of a more complex one,
which joins in bigger tables, resulting in even worse performance, but I
tracked it down to refusing the nested loop to be the problem.

Is there anything I could do to convince the planner to use here the
nested loop plan ?

Thanks,
Csaba.

Attachment Content-Type Size
table_a_stats.txt text/plain 1.4 KB
table_b1_stats.txt text/plain 1.1 KB
table_b2_stats.txt text/plain 1.1 KB
plans.txt text/plain 3.1 KB
table_b2_stats_non_fiddled.txt text/plain 609 bytes

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Csaba Nagy 2007-09-21 10:08:45 Re: Linux mis-reporting memory
Previous Message Dimitri Fontaine 2007-09-21 09:48:50 Re: Linux mis-reporting memory