Re: n_distinct off by a factor of 1000

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Fabio Pardi <f(dot)pardi(at)portavita(dot)eu>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: n_distinct off by a factor of 1000
Date: 2020-06-23 14:14:14
Message-ID: 3c310268-52f2-b960-8efd-bc15b08da7a7@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 6/23/20 7:05 AM, Fabio Pardi wrote:
>
> On 23/06/2020 14:42, Klaudie Willis wrote:
>> I got my first hint of why this problem occurs when I looked at the
>> statistics.  For the column in question, "instrument_ref" the
>> statistics claimed it to be:
>>
>> The default_statistics_target=500, and analyze has been run.
>> select * from pg_stats where attname like 'instr%_ref'; -- Result:
>> *40.000*
>> select count(distinct instrumentid_ref) from bigtable -- Result: *33
>> 385 922 (!!)*
>>
>> That is an astonishing difference of almost a 1000X.
>>
>
> I think you are counting 2 different things here.
>
> The first query returns all the columns "like 'instr%_ref'" present in
> the statistics (so in the whole cluster), while the second is counting
> the actual number of different rows in bigtable.

I believe the OP actually meant the query to be:

select n_distinct from pg_stats where attname like 'instr%_ref';

>
>
> regards,
>
> fabio pardi

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-06-23 14:28:34 Re: pg_dump empty tables
Previous Message Fabio Pardi 2020-06-23 14:05:49 Re: n_distinct off by a factor of 1000