Re: estimating # of distinct values

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: estimating # of distinct values
Date: 2011-01-18 22:16:24
Message-ID: 4D361138.8060802@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dne 18.1.2011 18:12, Robert Haas napsal(a):
> On Tue, Jan 18, 2011 at 4:53 AM, <tv(at)fuzzy(dot)cz> wrote:
>> So the most important question is how to intercept the new/updated rows,
>> and where to store them. I think each backend should maintain it's own
>> private list of new records and forward them only in case of commit. Does
>> that sound reasonable?
>
> At the risk of sounding demoralizing, nothing about this proposal
> sounds very promising to me, and that sounds like a particularly bad
> idea. What happens if the transaction updates a billion records? Or
> even a million records? Are you going to store all of those in
> backend-local memory until commit time? Or spool them in a
> potentially-gigantic disk file somewhere? That memory allocation - or
> file - could grow to be larger than the size of the entire database in
> the worst case. And COMMIT could take an awfully long time if it has
> to spool megabytes or gigabytes of data off to some other process.
> And what happens if there's a crash after the COMMIT but before all
> that data is sent? The estimates become permanently wrong?

Yes, I was aware of this problem (amount of memory consumed with lots of
updates), and I kind of hoped someone will come up with a reasonable
solution.

I was thinking about it and I believe at least one of the algorithms
(the "adaptive sampling") might handle "merging" i.e. the backends would
not need to store the list of values, just a private copy of the
estimator and update it. And at the end (after commit), the estimator
would be merged back into the actual one.

So the algorithm would be something like this:

1. create copy for all distinct estimators influenced by the INSERT
2. update the local copy
3. commit and merge the local copies back into the original estimator

The amount of copied data is very small, depending on the number of
distinct items it can handle and precision (see the adaptive.c for
details). Some examples

2^48 items + 1% error => 86 kB
2^48 items + 5% error => 3.4 kB
2^64 items + 1% error => 115 kB
2^64 items + 5% error => 4.6 kB

so it's much better than storing all the data.

But I really need to check this is possible (right now I'm quite busy
with organizing a local conference, so maybe next week).

Regarding the crash scenario - if the commit fails, just throw away the
local estimator copy, it's not needed. I'm not sure how to take care of
the case when commit succeeds and the write of the merged estimator
fails, but I think it might be possible to write the estimator to xlog
or something like that. So it would be replayed during recovery etc. Or
is it a stupid idea?

> And are we doing all of this just to get a more accurate estimate of
> ndistinct? For the amount of effort that it will probably take to get
> this working at all, you could probably implement index-only scans and
> have enough bandwidth left over to tackle global temporary tables.
> And unless I'm missing something, the chances that the performance
> consequences will be tolerable are pretty much zero. And it would
> only benefit the tiny fraction of users for whom bad n_distinct
> estimates cause bad plans, and then the even tinier percentage of
> those who can't conveniently fix it by using the manual override that
> we already have in place - which presumably means people who have
> gigantic tables that are regularly rewritten with massive changes in
> the data distribution that affect plan choice. Is that more than the
> empty set?

First of all, I've never proposed to use this as a default behaviour.
Actually I've strongly argued agains that idea on several occasions. So
the user would have to enable that explicitly for some columns, I really
am not an idiot to force everyone to pay for this.

So the goal is to help the "small fraction" of users and keep the
current solution for the rest of them.

And yes, the main reason why I am working on this are the multi-column
stats (in case of discrete columns). You can fix the ndistinct estimate
by manually overriding the estimate, but for the multi-column stats you
need an estimate of ndistinct for the combination of columns, and that's
a bit difficult to do manually.

The other reason is I've studied statistics (and I enjoy it, which seems
weird to most people). And it's a good way to learn the internals.

> Maybe the idea here is that this wouldn't fix just ndistinct estimates
> but would also help with multi-column statistics. Even if that's the
> case, I think it's almost certainly a dead end from a performance
> standpoint. Some kind of manual ANALYZE process that can be invoked
> when needed to scan the entire table would be painful but maybe
> acceptable for certain people with a problem they can't fix any other
> way, but doing this automatically for everyone seems like a really bad
> idea.

Yes, as I've mentioned above, the multi-column stats are the reason why
I started working on this. And yes, it really should work like this:

1. user realizes there's something wrong as the plans are bad
2. after analysis, the user realizes the reason is an imprecise
estimate due to dependence between columns
3. the user enables cross-column stats on the columns and checks
if it actually solved the issue
4. if the cost outweights the gains, the user drops the stats

Does that sound reasonable?

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message A.M. 2011-01-18 22:19:14 Re: test_fsync label adjustments
Previous Message Bruce Momjian 2011-01-18 22:16:03 Re: test_fsync label adjustments