Re: n_distinct off by a factor of 1000

From: Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: n_distinct off by a factor of 1000
Date: 2020-06-25 13:27:08
Message-ID: b750644a-9bad-d4fe-0b85-7ebaa6053f00@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hello,

> I got my first hint of why this problem occurs when I looked at the
> statistics.  For the column in question, "instrument_ref" the
> statistics claimed it to be:
>
> The default_statistics_target=500, and analyze has been run.
> select * from pg_stats where attname like 'instr%_ref'; -- Result:
> *40.000*
> select count(distinct instrumentid_ref) from bigtable -- Result: *33
> 385 922 (!!)*
>
> That is an astonishing difference of almost a 1000X.

> I have tried to increase the statistics target to 5000, and it helps,
> but it reduces the error to 100X.  Still crazy high.

As far as I know, increasing default_statistics_target will not help. [1]

> I have considered these fixes:
> - hardcode the statistics to a particular ratio of the total number of
> rows

You can hardcode the percentage of distinct values:
ALTER TABLE bigtable ALTER COLUMN instrument_ref SET ( n_distinct=-0.06
); /* -1 * (33385922 / 500000000) */

[1]
https://www.postgresql.org/message-id/4136ffa0812111823u645b6ec9wdca60b3da4b00499%40mail.gmail.com

-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wolff, Ken L 2020-06-25 15:15:07 RE: EXTERNAL: Re: Netapp SnapCenter
Previous Message Matthias Apitz 2020-06-25 12:54:02 Re: error messages (autovaccum canceled and syntax errors) while loading a DUMP