From: | Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com> |
---|---|
To: | Tim Kane <tim(dot)kane(at)gmail(dot)com> |
Cc: | pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Possible to improve optimisation / index usage based on domain properties of a function |
Date: | 2014-02-20 06:21:44 |
Message-ID: | CADp-Sm5Sd2NonjxnYBhHvXdx4RTHYPjJM1w82bsjAAe9HMRWxQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>
>
> Hi all,
>
> Take the following scenario
>
> I have a set of partitions inherited from a parent table, called *streams*
> .
> One of the properties of these tables is a timestamp field, nothing fancy
> about it.
>
> I also have a qualified index on this field.
>
>
> I've noticed that if I perform the following query, the planner will
> correctly use the CHECK constraints to determine the partition, and then
> use the indexes available to retrieve the streams between the specified
> dates.
>
>
> *select count(*) from streams where stream_date >= '2013-01-08' and
> stream_date < '2013-01-09';*
>
> This is correct way of writing such queries.
>
> If however, I was to provide the below query, it uses a sequential scan
> based plan. The planner is unable to utilise any indexes because it can't
> know what the function is going to return - thus unable to constrain the
> range at the time of planning the execution.
>
> *select count(*) from streams where date(stream_date) = '2013-01-08';*
>
>
> This will not use index on stream_date. Nothing unusual in this. Nothing
special with PostgreSQL. None of the RDBMS I have dealt with are smart
enough to transform this query.
> I'm wondering if we could build into postgres some level of metadata
> regarding the properties of a function, such that the optimiser could
> filter against the range of values that the function is expected to return.
>
> In this case, it could deduce that the date function will only ever return
> a value for *stream_date* to within a certain maximum and minimum range.
> Thus the planner could scan the index for all values of *stream_date*falling within +/- 24 hours of the right operand, and then check/re-check
> the results.
>
> If you can't go for the smarter query, go for more optimum index by
"expression based index"
http://www.postgresql.org/docs/9.1/static/indexes-expressional.html
>
> I suspect this would only be suitable for very basic functions, such as
> date(), date_trunc() - I suspect, for any function that reduces cardinality
> to any predictable degree.
>
>
Wrong, there are many expressions which won't use indexes but the moment
you shift the calculation from LHS to RHS, indexes will appear in plan.
This I have seen with at least 3 other RDBMS.
> Thoughts?
>
>
I don't think the RDBMS optimizer should be overloaded with smartness which
is expected from users writing it. If you do it, then there is no end to it.
> Tim
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | James Harper | 2014-02-20 07:15:13 | oid of Datum |
Previous Message | saravanan Chow | 2014-02-20 05:41:00 | Re: configuration error |