From: | Mat Arye <mat(at)timescale(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Additional Statistics Hooks |
Date: | 2018-03-13 15:25:13 |
Message-ID: | CADsUR0Bnh=PUHkNVHNGb-U8ckbck1An-pX8kwXkt=TZzVMOK6Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 13, 2018 at 6:56 AM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Tue, Mar 13, 2018 at 4:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Mat Arye <mat(at)timescale(dot)com> writes:
> >> So the use-case is an analytical query like
> >
> >> SELECT date_trunc('hour', time) AS MetricMinuteTs, AVG(value) as avg
> >> FROM hyper
> >> WHERE time >= '2001-01-04T00:00:00' AND time <= '2001-01-05T01:00:00'
> >> GROUP BY MetricMinuteTs
> >> ORDER BY MetricMinuteTs DESC;
> >
> >> Right now this query will choose a much-less-efficient GroupAggregate
> plan
> >> instead of a HashAggregate. It will choose this because it thinks the
> >> number of groups
> >> produced here is 9,000,000 because that's the number of distinct time
> >> values there are.
> >> But, because date_trunc "buckets" the values there will be about 24
> groups
> >> (1 for each hour).
> >
> > While it would certainly be nice to have better behavior for that,
> > "add a hook so users who can write C can fix it by hand" doesn't seem
> > like a great solution. On top of the sheer difficulty of writing a
> > hook function, you'd have the problem that no pre-written hook could
> > know about all available functions. I think somehow we'd need a way
> > to add per-function knowledge, perhaps roughly like the protransform
> > feature.
>
> Like cost associated with a function, we may associate mapping
> cardinality with a function. It tells how many distinct input values
> map to 1 output value. By input value, I mean input argument tuple. In
> Mat's case the mapping cardinality will be 12. The number of distinct
> values that function may output is estimated as number of estimated
> rows / mapping cardinality of that function.
>
I think this is complicated by the fact that the mapping cardinality is not
a constant per function
but depends on the constant given as the first argument to the function and
the granularity of the
underlying data (do you have a second-granularity or microsecond
granularity). I actually think the logic for the
estimate here should be the (max(time)-min(time))/interval. I think to be
general you need to allow functions on statistics to determine the estimate.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-03-13 15:29:03 | Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. |
Previous Message | David Steele | 2018-03-13 15:22:34 | Re: PATCH: Configurable file mode mask |