Quick Links

Re: Improving N-Distinct estimation by ANALYZE

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improving N-Distinct estimation by ANALYZE
Date:	2006-01-16 20:05:46
Message-ID:	1137441946.3180.187.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2006-01-16 at 12:26 -0600, Jim C. Nasby wrote:
> On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > >> It's also worth mentioning that for datatypes that only have an "="
> > >> operator the performance of compute_minimal_stats is O(N^2) when values
> > >> are unique, so increasing sample size is a very bad idea in that case.
> >
> > > Hmmm ... does ANALYZE check for UNIQUE constraints?
> >
> > Our only implementation of UNIQUE constraints is btree indexes, which
> > require more than an "=" operator, so this seems irrelevant.
>
> IIRC, the point was that if we know a field has to be unique, there's no
> sense in doing that part of the analysis on it; you'd only care about
> correlation.

An interesting point: if we know the cardinality by definition there is
no need to ANALYZE for that column. An argument in favour of the
definitional rather than the discovery approach. As a designer I very
often already know the cardinality by definition, so I would appreciate
the opportunity to specify that to the database.

Tom has not spoken against checking for UNIQUE constraints: he is just
pointing out that there never could be a constraint in the case I was
identifying. For other datatypes we could check for them rather than
analyzing the sample, but the effect would be the same either way.

My original point was that behaviour is O(N^2) when unique; it is still
O(N^2) when nearly unique, albeit O(N^2) - O(N). Reducing stats_target
does not help there - the effect varies according to number of rows in
the sample, not num slots in MCV list.

Best Regards, Simon Riggs

In response to

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-16 18:26:43 from Jim C. Nasby

Responses

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-16 20:38:23 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Manfred Koizar	2006-01-16 20:24:38	Re: Improving N-Distinct estimation by ANALYZE
Previous Message	Bruce Momjian	2006-01-16 19:54:32	Re: [HACKERS] message for constraint