Re: n_distinct off by a factor of 1000

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: n_distinct off by a factor of 1000
Date: 2020-06-24 09:54:48
Message-ID: 20200624095448.GA417@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2020-06-24 07:30:05 +0000, Klaudie Willis wrote:
> show default_statistics_target; --> 500
> ALTER TABLE public.bigtable ALTER COLUMN instrumentid_ref SET STATISTICS 5000;
>
> Here is the output of the "ANALYZE VERBOSE bigtable;"
> INFO: analyzing "public.bigtables" inheritance tree
[...]
> INFO: analyzing "public.bigtable_y2018"
> INFO: "bigtable_y2018": scanned 1500000 of 10661013 pages, containing 28915115
> live rows and 12589 dead rows; 1500000 rows in sample, 205509611 estimated
> total rows
> INFO: analyzing "public.bigtable_y2019"
> INFO: "bigtable_y2019": scanned 1500000 of 8911886 pages, containing 28888514
> live rows and 17778 dead rows; 1500000 rows in sample, 171634096 estimated
> total rows
> INFO: analyzing "public.bigtable_y2020"
> INFO: "bigtable_y2020": scanned 1500000 of 6126616 pages, containing 29488967
> live rows and 6330 dead rows; 1500000 rows in sample, 120445051 estimated total
> rows
> INFO: analyzing "public.bigtable_y2021"
> INFO: "bigtable_y2021": scanned 1 of 1 pages, containing 8 live rows and 0
> dead rows; 8 rows in sample, 8 estimated total rows

So it scans 1500000 rows from each partition, but ...

[...]
> frac_MCV;n_distinct; n_mcv; n_hist;tablename
> 0.9205394 122160 2140 5001 "bigtable"
> 0.9203018 124312 1736 5001 "bigtable_y2018"
> 0.9258158 113846 2107 5001 "bigtable_y2020"
> 0.875 -0.375 2 "bigtable_y2021"
> 0.92304045 118267 2204 5001 "bigtable_y2019"

Estimates the number of distinct values in each partition as between
113846 and 124312. So it can have encountered at most that many
different values, which means that it must have encountered each value
about 12 or 13 times on average.

My guess is that there are relatively few (less than 120000) distinct
values which make up the bulk (over 90 %) of these tables and a lot (33
million) values which are very rare.

Is this guess correct?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2020-06-24 10:47:36 Re: Persistent Connections
Previous Message Jaime Soler 2020-06-24 08:41:17 pgbench and timestamps