Re: Yet another abort-early plan disaster on 9.3

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-15 18:02:30
Message-ID: 543EB6B6.4080005@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 15.10.2014 19:20, Josh Berkus wrote:
> On 10/10/2014 04:16 AM, Greg Stark wrote:
>> On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Yes, it's only intractable if you're wedded to the idea of a tiny,
>>> fixed-size sample. If we're allowed to sample, say, 1% of the table, we
>>> can get a MUCH more accurate n_distinct estimate using multiple
>>> algorithms, of which HLL is one. While n_distinct will still have some
>>> variance, it'll be over a much smaller range.
>>
>> I've gone looking for papers on this topic but from what I read this
>> isn't so. To get any noticeable improvement you need to read 10-50% of
>> the table and that's effectively the same as reading the entire table
>> -- and it still had pretty poor results. All the research I could find
>> went into how to analyze the whole table while using a reasonable
>> amount of scratch space and how to do it incrementally.
>
> So, right now our estimation is off on large tables by -10X to
> -10000X. First, the fact that it's *always* low is an indication
> we're using the wrong algorithm. Second, we can most certainly do
> better than a median of -1000X.

A few days ago I posted an experimental patch with the adaptive
estimator, described in [1]. Not perfect, but based on the testing I did
I believe it's a superior algorithm to the one we use now. Would be nice
to identify a few datasets where the current estimate is way off.

[1]
http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf

> One interesting set of algorithms is block-based sampling. That is,
> you read 5% of the physical table in random blocks, reading every row
> in the block. The block size is determined by your storage block
> size, so you're not actually reading any more physically than you are
> logically; it really is just 5% of the table, especially on SSD.
>
> Then you apply algorithms which first estimate the correlation of
> common values in the block (i.e. how likely is it that the table is
> completely sorted?), and then estimates of how many values there
> might be total based on the correlation estimate.

I think we might also use a different approach - instead of sampling the
data when ANALYZE kicks in, we might collect a requested sample of rows
on the fly. Say we want 1% sample - whenever you insert a new row, you
do [random() < 0.01] and if it happens to be true you keep a copy of the
row aside. Then, when you need the sample, you simply read the sample
and you're done - no random access to the main table, no problems with
estimated being off due to block-level sampling, etc.

Not sure how to track deletions/updates, though. Maybe rebuilding the
sample if the number of deletions exceeds some threshold, but that
contradicts the whole idea a bit.

> I no longer have my ACM membership, so I can't link this, but
> researchers were able to get +/- 3X accuracy for a TPCH workload
> using this approach. A real database would be more variable, of
> course, but even so we should be able to achieve +/- 50X, which
> would be an order of magnitude better than we're doing now.

If you know the title of the article, it's usually available elsewhere
on the web - either at the university site, or elsewhere. I found these
two articles about block-based sampling:

http://ranger.uta.edu/~gdas/websitepages/preprints-papers/p287-chaudhuri.pdf

https://www.stat.washington.edu/research/reports/1999/tr355.pdf

Maybe there are more, but most of the other links were about how Oracle
does this in 11g.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-15 18:46:04 Re: Column Redaction
Previous Message Jeff Janes 2014-10-15 17:41:46 Re: Buffer Requests Trace

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Johansen 2014-10-15 20:05:07 Re: Partitions and work_mem?
Previous Message Josh Berkus 2014-10-15 17:20:18 Re: Yet another abort-early plan disaster on 9.3