From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-09 18:45:08 |
Message-ID: | CA+TgmoaYGgp2dSd4+8CrNCGVKfgjVbuzXQH9e=LUzX4aoL8x4w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 9, 2013 at 1:03 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I really don't believe the 5% thing. It's not enough for n_distinct
>> and it's *far* too high a value for linear properties like histograms
>> or nullfrac etc.
>
> Actually, it is enough for n_distinct, or more properly, 5% is as good
> as you can get for n_distinct unless you're going to jump to scanning
> 50% or more.
I'd like to see a proof of that result.
Not because I'm hostile to changing the algorithm, but because you've
made numerous mathematical claims on this thread that fly in the face
of what Greg, myself, and others understand to be mathematically true
- including this one. If our understanding is wrong, then by all
means let's get that fixed. But you're not going to convince anyone
here that we should rip out the existing algorithm and its
peer-reviewed journal citations by making categorical assertions about
the right way to do things.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-12-09 18:47:16 | Re: ANALYZE sampling is too good |
Previous Message | Peter Geoghegan | 2013-12-09 18:37:08 | Re: Performance optimization of btree binary search |