From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: improving GROUP BY estimation |
Date: | 2016-03-31 19:18:23 |
Message-ID: | 20518.1459451903@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 30 March 2016 at 14:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Attached is v4 of the patch
> Thanks, I think this is good to go, except that I think we need to use
> pow() rather than powl() because AIUI powl() is new to C99, and so
> won't necessarily be available on all supported platforms. I don't
> think we need worry about loss of precision, since that would only be
> an issue if rel->rows / rel->tuples were smaller than maybe 10^-14 or
> so, and it seems unlikely we'll get anywhere near that any time soon.
I took a quick look. I concur with using pow() not powl(); the latter
is not in SUS v2 which is our baseline portability expectation, and in
fact there is *noplace* where we expect long double to work. Moreover,
I don't believe that any of the estimates we're working with are so
accurate that a double-width power result would be a useful improvement.
Also, I wonder if it'd be a good idea to provide a guard against division
by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
reldistinct can't be zero. In the same vein, I'm worried about the first
argument of pow() being slightly negative due to roundoff error, leading
to a NaN result.
Maybe we should also consider clamping the final reldistinct estimate to
an integer with clamp_row_est(). The existing code doesn't do that but
it seems like a good idea on general principles.
Another minor gripe is the use of a random URL as justification. This
code will still be around when that URL exists nowhere but the Wayback
Machine. Can't we find a more formal citation to use?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Ramsey | 2016-03-31 19:19:32 | Re: Parallel Queries and PostGIS |
Previous Message | Alvaro Herrera | 2016-03-31 19:14:52 | Re: Recovery test failure for recovery_min_apply_delay on hamster |