From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: PATCH: adaptive ndistinct estimator v4 |
Date: | 2015-03-31 19:02:29 |
Message-ID: | 551AEF45.7010700@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hi all,
attached is v4 of the patch implementing adaptive ndistinct estimator.
I've been looking into the strange estimates, mentioned on 2014/12/07:
> values current adaptive
> ------------------------------
> 106 99 107
> 106 8 6449190
> 1006 38 6449190
> 10006 327 42441
I suspected this might be some sort of rounding error in the numerical
optimization (looking for 'm' solving the equation from paper), but
turns out that's not the case.
The adaptive estimator is a bit unstable for skewed distributions, that
are not sufficiently smooth. Whenever f[1] or f[2] was 0 (i.e. there
were no values occuring exactly once or twice in the sample), the result
was rather off.
The simple workaround for this was adding a fallback to GEE when f[1] or
f[2] is 0. GEE is another estimator described in the paper, behaving
much better in those cases.
With the current version, I do get this (with statistics_target=10):
values current adaptive
------------------------------
106 99 108
106 8 178
1006 38 2083
10006 327 11120
The results do change a bit based on the sample, but these values are a
good example of the values I'm getting.
The other examples (with skewed but smooth distributions) work as good
as before.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
ndistinct-estimator-v4.patch | text/x-diff | 13.1 KB |
ndistinct.sql | application/sql | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-03-31 19:07:08 | Re: GUC context information in the document. |
Previous Message | Tom Lane | 2015-03-31 18:37:17 | Re: Bug #10432 failed to re-find parent key in index |
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2015-03-31 19:52:33 | Some performance testing? |
Previous Message | David G. Johnston | 2015-03-31 17:03:10 | Re: Weird CASE WHEN behaviour causing query to be suddenly very slow |