From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Yet another abort-early plan disaster on 9.3 |
Date: | 2014-10-15 18:02:30 |
Message-ID: | 543EB6B6.4080005@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On 15.10.2014 19:20, Josh Berkus wrote:
> On 10/10/2014 04:16 AM, Greg Stark wrote:
>> On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Yes, it's only intractable if you're wedded to the idea of a tiny,
>>> fixed-size sample. If we're allowed to sample, say, 1% of the table, we
>>> can get a MUCH more accurate n_distinct estimate using multiple
>>> algorithms, of which HLL is one. While n_distinct will still have some
>>> variance, it'll be over a much smaller range.
>>
>> I've gone looking for papers on this topic but from what I read this
>> isn't so. To get any noticeable improvement you need to read 10-50% of
>> the table and that's effectively the same as reading the entire table
>> -- and it still had pretty poor results. All the research I could find
>> went into how to analyze the whole table while using a reasonable
>> amount of scratch space and how to do it incrementally.
>
> So, right now our estimation is off on large tables by -10X to
> -10000X. First, the fact that it's *always* low is an indication
> we're using the wrong algorithm. Second, we can most certainly do
> better than a median of -1000X.
A few days ago I posted an experimental patch with the adaptive
estimator, described in [1]. Not perfect, but based on the testing I did
I believe it's a superior algorithm to the one we use now. Would be nice
to identify a few datasets where the current estimate is way off.
[1]
http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf
> One interesting set of algorithms is block-based sampling. That is,
> you read 5% of the physical table in random blocks, reading every row
> in the block. The block size is determined by your storage block
> size, so you're not actually reading any more physically than you are
> logically; it really is just 5% of the table, especially on SSD.
>
> Then you apply algorithms which first estimate the correlation of
> common values in the block (i.e. how likely is it that the table is
> completely sorted?), and then estimates of how many values there
> might be total based on the correlation estimate.
I think we might also use a different approach - instead of sampling the
data when ANALYZE kicks in, we might collect a requested sample of rows
on the fly. Say we want 1% sample - whenever you insert a new row, you
do [random() < 0.01] and if it happens to be true you keep a copy of the
row aside. Then, when you need the sample, you simply read the sample
and you're done - no random access to the main table, no problems with
estimated being off due to block-level sampling, etc.
Not sure how to track deletions/updates, though. Maybe rebuilding the
sample if the number of deletions exceeds some threshold, but that
contradicts the whole idea a bit.
> I no longer have my ACM membership, so I can't link this, but
> researchers were able to get +/- 3X accuracy for a TPCH workload
> using this approach. A real database would be more variable, of
> course, but even so we should be able to achieve +/- 50X, which
> would be an order of magnitude better than we're doing now.
If you know the title of the article, it's usually available elsewhere
on the web - either at the university site, or elsewhere. I found these
two articles about block-based sampling:
http://ranger.uta.edu/~gdas/websitepages/preprints-papers/p287-chaudhuri.pdf
https://www.stat.washington.edu/research/reports/1999/tr355.pdf
Maybe there are more, but most of the other links were about how Oracle
does this in 11g.
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-10-15 18:46:04 | Re: Column Redaction |
Previous Message | Jeff Janes | 2014-10-15 17:41:46 | Re: Buffer Requests Trace |
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Johansen | 2014-10-15 20:05:07 | Re: Partitions and work_mem? |
Previous Message | Josh Berkus | 2014-10-15 17:20:18 | Re: Yet another abort-early plan disaster on 9.3 |