Re: estimating # of distinct values

From: Csaba Nagy <ncslists(at)googlemail(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimating # of distinct values
Date: 2011-01-10 11:02:31
Message-ID: 1294657351.2962.37.camel@clnt-sysecm-cnagy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2011-01-07 at 12:32 +0100, tv(at)fuzzy(dot)cz wrote:
> the problem is you will eventually need to drop the results and rebuild
> it, as the algorithms do not handle deletes (ok, Florian mentioned an
> algorithm L_0 described in one of the papers, but I'm not sure we can use
> it).

Yes, but even then you can start with much better cards if you already
have an estimate of what it looks like, based on the fact that you did
continuous updating of it. For example you'll have a pretty good
estimate of the bounds of the number of distinct values, while if you
really start from scratch you have nothing to start with but assume that
you must cope with the complete range between all values are distinct ->
there's only a few of them.

> I'm not sure a constantly running background process is a good idea. I'd
> prefer storing an info about the modified tuples somewhere, and starting
> analyze only when a given threshold is reached. I'm not sure how to do
> that, though.
>
> Another thing I'm not sure about is where to store those intermediate
> stats (used to get the current estimate, updated incrementally).

The forks implementation proposed in other responses is probably the
best idea if usable. It will also solve you the problem of memory
limitations, at the expense of more resources used if the structure
grows too big, but it will still be probably fast enough if it stays
small and in cache.

Cheers,
Csaba.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tv 2011-01-10 11:56:24 Re: estimating # of distinct values
Previous Message Greg Smith 2011-01-10 08:25:23 Re: pgbench to the MAXINT