Quick Links

Re: Why won't it index scan?

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Peter Kovacs <peter(dot)kovacs(at)chemaxon(dot)hu>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Why won't it index scan?
Date:	2006-05-18 16:09:49
Message-ID:	200605181609.k4IG9nZ10282@candle.pha.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Peter Kovacs wrote:
> Sorry for the naive question, but: is there a problem with analyze doing
> full table scans? Analyze will not lock anything, will it?

It used to do that, but the read overhead was too great.

---------------------------------------------------------------------------

>
> Peter
>
> Greg Stark wrote:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >
> >
> >> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> >>
> >>> So, does this sound like we just happened to get repeatedly
> >>> horribly unrepresentative random samples with stats target at
> >>> 10? Are we at the mercy of randomness here? Or is there a
> >>> better preventive procedure we can follow to systematically
> >>> identify this kind of situation?
> >>>
> >> I think the real issue is that stats target 10 is too small for large
> >> tables: the samples are just not large enough to support a decent
> >> numdistinct estimate, which is the critical stat for cases such as this
> >> (ie, estimating the number of hits on a value that's not in the
> >> most-common-values list).
> >>
> >
> > There's been some discussion on -hackers about this area. Sadly the idea of
> > using samples to calculate numdistinct estimates is fundamentally on pretty
> > shaky ground.
> >
> > Whereas a fixed sample size works fine for calculating distribution of values,
> > in order to generate consistent precision for numdistinct estimates the
> > samples will have to be a constant fraction of the table -- and unfortunately
> > a pretty large fraction at that.
> >
> > So sadly I think "at the mercy of randomness" is pretty accurate. You'll have
> > to raise the statistics target as the table grows and I expect you'll
> > eventually run into some downsides of large stats targets.
> >
> > Some better algorithms were posted, but they would require full table scans
> > during analyze, not just samples.
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Re: Why won't it index scan? at 2006-05-18 06:52:23 from Peter Kovacs

Browse pgsql-general by date

	From	Date	Subject
Next Message	Florian Weimer	2006-05-18 16:30:34	Re: Announce: GPL Framework centered on Postgres
Previous Message	Bob Pawley	2006-05-18 16:02:51	Re: Interface