Quick Links

Re: ANALYZE sampling is too good

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ANALYZE sampling is too good
Date:	2013-12-06 01:52:34
Message-ID:	CAM3SWZREK9cRovD2X=3pMqYgq1QfhG6xmfdwD_gN0FEsH9td+w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> There are fairly well researched algorithms for block-based sampling
> which estimate for the skew introduced by looking at consecutive rows in
> a block. In general, a minimum sample size of 5% is required, and the
> error is no worse than our current system. However, the idea was shot
> down at the time, partly because I think other hackers didn't get the math.

I think that this certainly warrants revisiting. The benefits would be
considerable.

Has anyone ever thought about opportunistic ANALYZE piggy-backing on
other full-table scans? That doesn't really help Greg, because his
complaint is mostly that a fresh ANALYZE is too expensive, but it
could be an interesting, albeit risky approach.
Opportunistically/unpredictably acquiring a ShareUpdateExclusiveLock
would be kind of weird, for one thing, but if a full table scan really
is very expensive, would it be so unreasonable to attempt to amortize
that cost?

--
Peter Geoghegan

In response to

Re: ANALYZE sampling is too good at 2013-12-05 23:50:58 from Josh Berkus

Responses

Re: ANALYZE sampling is too good at 2013-12-06 08:49:33 from Amit Kapila
Re: ANALYZE sampling is too good at 2013-12-06 09:21:14 from Andres Freund
Re: ANALYZE sampling is too good at 2013-12-10 01:00:40 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joe Conway	2013-12-06 02:29:17	dblink performance regression
Previous Message	Tom Lane	2013-12-06 01:48:32	Re: WITHIN GROUP patch