Re: Odd (slow) plan choice with min/max

From: Paul McGarry <paul(at)paulmcgarry(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Odd (slow) plan choice with min/max
Date: 2021-03-23 06:51:49
Message-ID: CAPrE0SZgRiSvLJrUP5CFKqUxnWqQ08k5zhcE+XJ4yQetKQjZ-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 23 Mar 2021 at 16:13, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote:
> > I have a query where Postgresql (11.9 at the moment) is making an odd
> plan
> > choice, choosing to use index scans which require filtering out millions
> of
> > rows, rather than "just" doing an aggregate over the rows the where
> clause
> > targets which is much faster.
> > AFAICT it isn't a statistics problem, at least increasing the stats
> target
> > and analyzing the table doesn't seem to fix the problem.
>
> > explain analyze select min(risk_id),max(risk_id) from risk where
> > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';
>
> I'm guessing the time and ID columns are highly correlated...
>
> So the planner thinks it can get the smallest ID by scanning the ID index,
> but
> then ends up rejecting the first 161e6 rows for which the time is too low,
> and
> fails the >= condition.
>
> And thinks it can get the greatest ID by backward scanning the ID idx, but
> ends
> up rejecting/filtering the first 41e6 rows, for which the time is too high,
> failing the < condition.
>

Yes, the columns are highly correlated, but that alone doesn't seem like it
should be sufficient criteria to choose this plan.
Ie the selection criteria (1 day of data about a year ago) has a year+
worth of data after it and probably a decade of data before it, so anything
walking a correlated index from top or bottom is going to have to walk past
a lot of data before it gets to data that fits the criteria.

> One solution seems to be to create an index on (i,j), but I don't know if
> there's a better way.
>
>
Adding the count() stops the planner considering the option so that will
work for now.
My colleague has pointed out that we had the same issue in November and I
came up with the count() workaround then too, but somehow seem to have
forgotten it in the meantime and reinvented it today. I wonder if I posted
to pgsql-performance then too.....

Maybe time for me to read the PG12 release notes....

Paul

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2021-03-23 13:07:11 Re: Odd (slow) plan choice with min/max
Previous Message Justin Pryzby 2021-03-23 05:13:03 Re: Odd (slow) plan choice with min/max