Re: improving GROUP BY estimation

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improving GROUP BY estimation
Date: 2016-03-31 20:43:58
Message-ID: CAEZATCWb4dThbZqEXEB06SU_dbrHe+s1WOgc0+40kzJAGRnD1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 March 2016 at 20:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Also, I wonder if it'd be a good idea to provide a guard against division
> by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
> reldistinct can't be zero. In the same vein, I'm worried about the first
> argument of pow() being slightly negative due to roundoff error, leading
> to a NaN result.

Yeah, that makes sense. In fact, if we only apply the adjustment when
reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first
argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it
is guaranteed to be non-negative. If rel->rows >= rel->tuples (not
sure if it can be greater), then we just want the original
reldistinct.

> Maybe we should also consider clamping the final reldistinct estimate to
> an integer with clamp_row_est(). The existing code doesn't do that but
> it seems like a good idea on general principles.

OK, that seems sensible.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2016-03-31 20:46:13 Re: improving GROUP BY estimation
Previous Message Tom Lane 2016-03-31 20:40:53 Re: improving GROUP BY estimation