From: | Rick Otten <rottenwindfish(at)gmail(dot)com> |
---|---|
To: | Paul McGarry <paul(at)paulmcgarry(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Odd (slow) plan choice with min/max |
Date: | 2021-03-23 13:07:11 |
Message-ID: | CAMAYy4JNU6hSJZc-gQZwKqF1Z2z-jH00ySg+u959gbQZ3GJvKw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Mar 23, 2021 at 2:52 AM Paul McGarry <paul(at)paulmcgarry(dot)com> wrote:
>
>
> On Tue, 23 Mar 2021 at 16:13, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
>> On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote:
>> > I have a query where Postgresql (11.9 at the moment) is making an odd
>> plan
>> > choice, choosing to use index scans which require filtering out
>> millions of
>> > rows, rather than "just" doing an aggregate over the rows the where
>> clause
>> > targets which is much faster.
>> > AFAICT it isn't a statistics problem, at least increasing the stats
>> target
>> > and analyzing the table doesn't seem to fix the problem.
>>
>> > explain analyze select min(risk_id),max(risk_id) from risk where
>> > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';
>>
>> I'm guessing the time and ID columns are highly correlated...
>>
>> So the planner thinks it can get the smallest ID by scanning the ID
>> index, but
>> then ends up rejecting the first 161e6 rows for which the time is too
>> low, and
>> fails the >= condition.
>>
>> And thinks it can get the greatest ID by backward scanning the ID idx,
>> but ends
>> up rejecting/filtering the first 41e6 rows, for which the time is too
>> high,
>> failing the < condition.
>>
>
> Yes, the columns are highly correlated, but that alone doesn't seem like
> it should be sufficient criteria to choose this plan.
> Ie the selection criteria (1 day of data about a year ago) has a year+
> worth of data after it and probably a decade of data before it, so anything
> walking a correlated index from top or bottom is going to have to walk past
> a lot of data before it gets to data that fits the criteria.
>
I assume you have a statistic on the correlated columns, ie `create
statistic` ?
If you can't use partitions on your date column, can you use partial
indexes instead? Or a functional index with min() over day and max() over
day?
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Stephens | 2021-03-23 15:21:55 | Re: SQL performance issue (postgresql chooses a bad plan when a better one is available) |
Previous Message | Paul McGarry | 2021-03-23 06:51:49 | Re: Odd (slow) plan choice with min/max |