Re: Top N queries and disbursion

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Roberto Cornacchia <rcorna(at)tin(dot)it>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Top N queries and disbursion
Date: 1999-10-07 23:16:58
Message-ID: 1078.939338218@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Roberto Cornacchia <rcorna(at)tin(dot)it> writes:
> ... since we are working on snapshots of the 6.6
> release (now we are using snapshot dated 9/13/99) we are afraid of
> instability problems to affect the results. Could you give us any
> suggestion about this? We are quite close to the degree day, so we have
> to optimize time usage...

If you don't want to spend time tracking development changes then you
probably ought to stick with the snapshot you have. I don't see any
reason that you should try to track changes right now...

> We need to estimate the number of distinct values of an attribute. We
> thought 1/disbursion was the right solution, but the results were quite
> wrong:

No, it's certainly not the right thing. To my understanding, disbursion
is a measure of the frequency of the most common value of an attribute;
but that tells you very little about how many other values there are.
1/disbursion is a lower bound on the number of values, but it wouldn't
be a good estimate unless you had reason to think that the values were
pretty evenly distributed. There could be a *lot* of very-infrequent
values.

> with 100 distinct values of an attribute uniformly distribuited in a
> relation of 10000 tuples, disbursion was estimated as 0.002275, giving
> us 440 distinct values.

This is an illustration of the fact that Postgres' disbursion-estimator
is pretty bad :-(. It usually underestimates the frequency of the most
common value, unless the most common value is really frequent
(probability > 0.2 or so). I've been trying to think of a more accurate
way of figuring the statistic that wouldn't be unreasonably slow.
Or, perhaps, we should forget all about disbursion and adopt some other
statistic(s).

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-10-07 23:28:15 Re: [HACKERS] Re: psql and comments
Previous Message Peter Eisentraut 1999-10-07 22:09:43 Re: [HACKERS] psql and comments