pg_statistic "forced" values

From: Jacques Caron <jc(at)oxado(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_statistic "forced" values
Date: 2007-11-07 16:18:29
Message-ID: 20071107161839.03C8C1165946@zeus.directinfos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It is well known that in some instances the Postgresql will make
estimates of the number of distinct values in a table that can be
quite far off reality. This then has a tendency to make the planner
lean towards unsavory plans (read: seqscans) because it estimates the
number of lines returned by a part of the request as being quite a
lot more than they really are.

The "good" solution would be to fix the estimator, but there has
already been long discussions on this topic in the past years and
apparently no consensus was found, with alternatives proposed
"fixing" some cases where the current estimator is wrong but getting
in trouble in others, or requiring quite a bit more CPU/memory/disk
I/O to achieve their results (correct me if I'm wrong).

There is a "simple" way to override this, which is to change the
value present in pg_statistic, however it will be overwritten the
next time ANALYZE (or VACUUM ANALYZE) is run. This thus requires
adding updates to this value every time a request that might be
fooled by it is executed, which is cumbersome, and does not
facilitate updates of this value (especially with positive values of
stadistinct).

It seems to me it would be a good idea to be able to store a forced
value for stadistinct in pg_attribute (with optionally some clauses
to set/change/reset it in CREATE TABLE, ALTER TABLE ADD COLUMN and
ALTER TABLE ALTER COLUMN, in a way similar to the STATISTICS clauses).

Alternatively, it could be a simple boolean to just say "don't update
stadistinct".

Or did I miss something and this already exists somewhere?

If not, are there any comments or suggestions regarding implementing this?

Thanks,

Jacques.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-11-07 16:34:52 Re: Estimation problem with a LIKE clause containing a /
Previous Message Tom Lane 2007-11-07 16:02:00 Re: Weird type selection choice