Re: n_distinct off by a factor of 1000

From: Fabio Pardi <f(dot)pardi(at)portavita(dot)eu>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: n_distinct off by a factor of 1000
Date: 2020-06-23 14:05:49
Message-ID: 486df039-f581-411c-c169-6d87ce81bcf6@portavita.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 23/06/2020 14:42, Klaudie Willis wrote:
> I got my first hint of why this problem occurs when I looked at the statistics.  For the column in question, "instrument_ref" the statistics claimed it to be:
>
> The default_statistics_target=500, and analyze has been run.
> select * from pg_stats where attname like 'instr%_ref'; -- Result: *40.000*
> select count(distinct instrumentid_ref) from bigtable -- Result: *33 385 922 (!!)*
>
> That is an astonishing difference of almost a 1000X. 
>

I think you are counting 2 different things here.

The first query returns all the columns "like 'instr%_ref'" present in the statistics (so in the whole cluster), while the second is counting the actual number of different rows in bigtable.

regards,

fabio pardi

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-06-23 14:14:14 Re: n_distinct off by a factor of 1000
Previous Message Adrian Klaver 2020-06-23 13:56:35 Re: pg_dump empty tables