Re: Nested loop vs merge join: inconsistencies between estimated and actual time

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Nested loop vs merge join: inconsistencies between estimated and actual time
Date: 2008-03-07 06:35:46
Message-ID: 7386.1204871746@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru> writes:
> I've came across this issue while writing report-like query for 2 not
> very large tables. I've tried several methods to resolve this one (see
> below). But now I'm really stuck...

It looks like you are wishing to optimize for all-in-memory situations,
in which case the traditional advice is to reduce random_page_cost to
something close to 1. AFAICS all the rowcount estimates you're seeing
are spot on, or as close to spot on as you could realistically hope for,
and so the problem lies with the cost parameters. Fooling with the
statistics is not going to help if the rowcount estimates are already
good.

(Note: the apparent undercounts you're seeing on indexscans on the outer
side of a mergejoin seem to be because the mergejoin terminates early
due to limited range of the other input join key. The planner is
expecting this, as we can see because the predicted cost of the join is
actually much less than the predicted cost of running the input
indexscan to completion. The cost ratio is about consistent with the
rowcount ratio, which makes me think it got these right too.)

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Rotek 2008-03-07 08:35:42 Toast space grows
Previous Message Vlad Arkhipov 2008-03-07 05:50:40 Nested loop vs merge join: inconsistencies between estimated and actual time