From: | Yoan SULTAN <yeyeyo(at)gmail(dot)com> |
---|---|
To: | Paul McGarry <paul(at)paulmcgarry(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Odd (slow) plan choice with min/max |
Date: | 2021-03-24 09:26:12 |
Message-ID: | CAPJFe0Y10nUvdBWf9ti4g2RDojXMwGgSpS5tAn16_1wqWF4DwQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Another workaround could be :
explain analyze select min(risk_id),max(risk_id) from (select * from risk
where time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00')
t;
in order to force the planner to use first the timestamp index.
However, I agree with you; we meet a planner bad behavior here.
Regards,
Yoan SULTAN
Le mar. 23 mars 2021 à 22:38, Paul McGarry <paul(at)paulmcgarry(dot)com> a écrit :
>
>
> On Wed, 24 Mar 2021 at 00:07, Rick Otten <rottenwindfish(at)gmail(dot)com> wrote:
>
>>
>>> Yes, the columns are highly correlated, but that alone doesn't seem like
>>> it should be sufficient criteria to choose this plan.
>>> Ie the selection criteria (1 day of data about a year ago) has a year+
>>> worth of data after it and probably a decade of data before it, so anything
>>> walking a correlated index from top or bottom is going to have to walk past
>>> a lot of data before it gets to data that fits the criteria.
>>>
>>
>>
>> I assume you have a statistic on the correlated columns, ie `create
>> statistic` ?
>>
>
> I didn't, but adding
> ======
> CREATE STATISTICS risk_risk_id_time_correlation_stats ON risk_id,time FROM
> risk;
> analyze risk;
> ======
> doesn't seem to help.
> I get the same plan before/after. Second run was faster, but just because
> data was hot.
>
>
>
>> If you can't use partitions on your date column, can you use partial
>> indexes instead? Or a functional index with min() over day and max() over
>> day?
>>
>
> I don't particularly want to add more weird indexes to solve this one
> particular query. as the existing risk_id index should make it efficient
> enough if only the planner chose to use it. This is part of an archiving
> job, identifying sections of historical data, so not a query that needs to
> be super optimised, but essentially doing a full table scan
> backwards/forwards as it is now is doing a lot of unnecessary IO that would
> be best left free for more time sensitive queries.My count(() workaround
> works so we can use that.
> I'm more interested in understanding why the planner makes what seems to
> be an obviously bad choice.
>
> Paul
>
--
Regards,
Yo.
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2021-03-25 08:05:43 | Re: proposal: schema variables - doc |
Previous Message | Hannu Krosing | 2021-03-23 23:20:49 | Re: SQL performance issue (postgresql chooses a bad plan when a better one is available) |