Re: Odd (slow) plan choice with min/max

From: Yoan SULTAN <yeyeyo(at)gmail(dot)com>
To: Paul McGarry <paul(at)paulmcgarry(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Odd (slow) plan choice with min/max
Date: 2021-03-24 09:26:12
Message-ID: CAPJFe0Y10nUvdBWf9ti4g2RDojXMwGgSpS5tAn16_1wqWF4DwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Another workaround could be :

explain analyze select min(risk_id),max(risk_id) from (select * from risk
where time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00')
t;

in order to force the planner to use first the timestamp index.

However, I agree with you; we meet a planner bad behavior here.

Regards,
Yoan SULTAN

Le mar. 23 mars 2021 à 22:38, Paul McGarry <paul(at)paulmcgarry(dot)com> a écrit :

>
>
> On Wed, 24 Mar 2021 at 00:07, Rick Otten <rottenwindfish(at)gmail(dot)com> wrote:
>
>>
>>> Yes, the columns are highly correlated, but that alone doesn't seem like
>>> it should be sufficient criteria to choose this plan.
>>> Ie the selection criteria (1 day of data about a year ago) has a year+
>>> worth of data after it and probably a decade of data before it, so anything
>>> walking a correlated index from top or bottom is going to have to walk past
>>> a lot of data before it gets to data that fits the criteria.
>>>
>>
>>
>> I assume you have a statistic on the correlated columns, ie `create
>> statistic` ?
>>
>
> I didn't, but adding
> ======
> CREATE STATISTICS risk_risk_id_time_correlation_stats ON risk_id,time FROM
> risk;
> analyze risk;
> ======
> doesn't seem to help.
> I get the same plan before/after. Second run was faster, but just because
> data was hot.
>
>
>
>> If you can't use partitions on your date column, can you use partial
>> indexes instead? Or a functional index with min() over day and max() over
>> day?
>>
>
> I don't particularly want to add more weird indexes to solve this one
> particular query. as the existing risk_id index should make it efficient
> enough if only the planner chose to use it. This is part of an archiving
> job, identifying sections of historical data, so not a query that needs to
> be super optimised, but essentially doing a full table scan
> backwards/forwards as it is now is doing a lot of unnecessary IO that would
> be best left free for more time sensitive queries.My count(() workaround
> works so we can use that.
> I'm more interested in understanding why the planner makes what seems to
> be an obviously bad choice.
>
> Paul
>

--
Regards,
Yo.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2021-03-25 08:05:43 Re: proposal: schema variables - doc
Previous Message Hannu Krosing 2021-03-23 23:20:49 Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)