From: | David Gould <daveg(at)sonic(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com> |
Subject: | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. |
Date: | 2018-03-03 02:57:52 |
Message-ID: | 20180302185752.46b82671@engels |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 02 Mar 2018 17:17:29 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> But by the same token, analyze only looked at 0.0006 of the pages. It's
> nice that for you, that's enough to get a robust estimate of the density
> everywhere; but I have a nasty feeling that that won't hold good for
> everybody.
My grasp of statistics is somewhat weak, so please inform me if I've got
this wrong, but every time I've looked into it I've found that one can get
pretty good accuracy and confidence with fairly small samples. Typically 1000
samples will serve no matter the population size if the desired margin of
error is 5%. Even with 99% confidence and a 1% margin of error it takes less
than 20,000 samples. See the table at:
http://www.research-advisors.com/tools/SampleSize.htm
Since we have by default 30000 sample pages and since ANALYZE takes some
trouble to get a random sample I think we really can rely on the results of
extrapolating reltuples from analyze.
-dg
--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-03-03 03:46:13 | Re: zheap: a new storage format for PostgreSQL |
Previous Message | David Steele | 2018-03-03 02:08:22 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |