Re: estimating # of distinct values

From: tv(at)fuzzy(dot)cz
To: "Csaba Nagy" <ncslists(at)googlemail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimating # of distinct values
Date: 2011-01-07 11:32:43
Message-ID: 1d0cb03ef3ab1c04fe87d6910b51d243.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
>> How is an incremental ANALYZE going to work at all?
>
> How about a kind of continuous analyze ?
>
> Instead of analyzing just once and then drop the intermediate results,
> keep them on disk for all tables and then piggyback the background
> writer (or have a dedicated process if that's not algorithmically
> feasible) and before writing out stuff update the statistics based on
> the values found in modified buffers. Probably it could take a random
> sample of buffers to minimize overhead, but if it is done by a
> background thread the overhead could be minimal anyway on multi-core
> systems.

Hi,

the problem is you will eventually need to drop the results and rebuild
it, as the algorithms do not handle deletes (ok, Florian mentioned an
algorithm L_0 described in one of the papers, but I'm not sure we can use
it).

I'm not sure a constantly running background process is a good idea. I'd
prefer storing an info about the modified tuples somewhere, and starting
analyze only when a given threshold is reached. I'm not sure how to do
that, though.

Another thing I'm not sure about is where to store those intermediate
stats (used to get the current estimate, updated incrementally). I was
thinking about pg_stats but I'm not sure it's the right place - depending
on the algorithm, this may be a fet kilobytes up to several megabytes. And
it's not needed except when updating it. Any ideas?

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2011-01-07 11:41:38 Re: Snapshot synchronization, again...
Previous Message Simon Riggs 2011-01-07 10:20:35 Re: system views for walsender activity