Quick Links

Re: query slows down with more accurate stats

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: query slows down with more accurate stats
Date:	2004-04-16 15:37:15
Message-ID:	1082129835.23419.869.camel@camel
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Tue, 2004-04-13 at 15:18, Tom Lane wrote:
> Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> Well, the first problem is why is ANALYZE's estimate of the total row
> count so bad :-( ? I suspect you are running into the situation where
> the initial pages of the table are thinly populated and ANALYZE
> mistakenly assumes the rest are too.

That was my thinking, which is somewhat confirmed after a vacuum full on
the table; now analyze gives pretty accurate states. Of course the
downside is that now the query is consistently slower.

> > so i guess i am wondering if there is something I should be doing to
> > help get the better plan at the more accurate stats levels and/or why it
> > doesn't stick with the original plan (I noticed disabling merge joins
> > does seem to push it back to the original plan).
>
> With the larger number of estimated rows it's figuring the nestloop will
> be too expensive. The row estimate for the cl scan went up from 1248
> to 10546, so the estimated cost for the nestloop plan would go to about
> 240000 units vs 80000 for the mergejoin plan. This is obviously off
> rather badly when the true runtimes are 1.7 vs 8.1 seconds :-(.
>
> I think this is an example of a case where we really need better
> estimation of nestloop costs --- it's drastically overestimating the
> relative cost of the nestloop because it's not accounting for the cache
> benefits of the repeated index searches. You could probably force the
> nestloop to be chosen by lowering random_page_cost, but that's just a
> kluge solution ... the real problem is the model is wrong.
>

Unfortunately playing with random_page_cost doesn't seem to be enough to
get it to favor the nested loop... though setting it down to 2 does help
overall. played with index_cpu_tuple_cost a bit but that seemed even
less useful. aggravating when you know there is a better plan it could
pick but no (clean) way to get it to do so...

> I have a to-do item to work on this, and will try to bump up its
> priority a bit.
>

I'll keep an eye out, thanks Tom.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

In response to

Re: query slows down with more accurate stats at 2004-04-13 19:18:42 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Steve Atkins	2004-04-16 16:09:02	Re: Socket communication for contrib
Previous Message	scott.marlowe	2004-04-16 15:33:43	Re: [HACKERS] Remove MySQL Tools from Source?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jim C. Nasby	2004-04-16 15:45:02	Horribly slow hash join
Previous Message	Chris Kratz	2004-04-16 15:28:00	Long running queries degrade performance