Quick Links

Cost estimation problem on seq scan in a loop

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Cost estimation problem on seq scan in a loop
Date:	2013-12-16 23:41:18
Message-ID:	CAMkU=1zHSXESAZhqH5m5oabqV8Je2iGFPAX=SK2zbGv5ZOsvBA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

While doing ad hoc queries I've seen several different problems that all
seem to be variations on a theme.

The plan comes out looking like this:

Nested Loop (cost=826867.95..877038.04 rows=1 width=125)
Join Filter: (foo.bar = smallish_table.bar)
-> Something Complicated (cost=826867.95..876800.28 rows=1 width=81)
.....
-> Seq Scan on smallish_table (cost=0.00..142.89 rows=7389 width=44)

The estimate of rows=1 for Something Complicated is wrong and you really
get 1000 or 100,000 rows. Meaning the seq scan on smallish_table gets
iterated a lot, and the time really adds up.

It would be great if Something Complicated had the correct row estimate,
but since I've seen this situation arise with a lot of different Something
Complicated that don't have much to do with each other (although usually an
antijoin of some kind is involved) , there is little reason to think we can
squash every one of them.

Is there some principled way to go about teaching the planner that hashing
smallish_table on the join filter key is a cheap insurance policy against
underestimating the row count of the outer loop?

Cheers,

Jeff

Responses

Re: Cost estimation problem on seq scan in a loop at 2013-12-17 00:09:19 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2013-12-16 23:59:12	Re: planner missing a trick for foreign tables w/OR conditions
Previous Message	Heikki Linnakangas	2013-12-16 22:49:32	Re: GIN improvements part 1: additional information