From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: [HACKERS] What does explain show ? |
Date: | 2000-01-05 00:30:37 |
Message-ID: | 15955.947032237@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Quite some time ago, "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> wrote:
> I have a question about "explain" output.
> Table a has 15905 rows and table b has 25905 rows.
> For the following query
> select a.pkey, b.key2 from a, b
> where b.key1 = 1369
> and a.pkey = b.key1;
> "explain" shows
> NOTICE: QUERY PLAN:
> Nested Loop (cost=6.19 rows=3 width=10)
> -> Index Scan using b_pkey on b on b (cost=2.09 rows=2 width=6)
> -> Index Scan using a_pkey on a on a (cost=2.05 rows=15905 width=4)
> What does "rows=15905" of InnerPlan mean ?
I have finally traced through enough of the optimizer logic that I
understand where these numbers are coming from. A nestloop with an
inner index scan is a slightly unusual beast, because the cost of the
inner scan can often be reduced by using the join conditions as index
restrictions. For example, if we have "outer.a = inner.b" and the
inner scan is an indexscan on b, then during the inner scan that's
done for an outer tuple with a = 42 we'd use "b = 42" as an indexqual.
This makes the inner scan much cheaper than it would be if we had to
scan the whole table.
Now the problem is that the "rows=" numbers come from the RelOptInfo
nodes for each relation, and they are set independently of the context
that the relation is used in. For any context except an inner
indexscan, we would indeed have to scan all 15905 rows of a, because
we have no pure-restriction WHERE clauses that apply to a. So that's
why rows says 15905. The cost is being estimated correctly for the
context, though --- an indexscan across 15905 rows would take a lot more
than 2 disk accesses.
This is just a cosmetic bug since it doesn't affect the planner's cost
estimate; still, it makes the EXPLAIN output confusing. I think the
output for a nestloop should probably show the estimated number of rows
that will be scanned during each pass of the inner indexscan, which
would be about 1 in the above example. This could be done by saving the
estimated row count (or just the selectivity) in IndexScan path nodes.
Comments? Does anyone think we should show some other number?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Don Baccus | 2000-01-05 01:29:44 | Re: [HACKERS] Inprise/Borland releasing Interbase as Open source |
Previous Message | Jan Wieck | 2000-01-05 00:20:02 | Thomas! FOREIGN KEY problem! |