Re: checking for a NULL date in a partitioned table kills performance

From: Sbob <sbob(at)quadratum-braccas(dot)com>
To: Rui DeSousa <rui(dot)desousa(at)icloud(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: checking for a NULL date in a partitioned table kills performance
Date: 2024-08-23 13:47:00
Message-ID: d804b268-4ab2-428b-b08d-e964b828eda8@quadratum-braccas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance


On 8/22/24 6:07 PM, Rui DeSousa wrote:
>
>
>> On Aug 22, 2024, at 7:32 PM, Sbob <sbob(at)quadratum-braccas(dot)com> wrote:
>>
>>
>> On 8/22/24 5:26 PM, Sbob wrote:
>>>
>>> On 8/22/24 5:06 PM, Rui DeSousa wrote:
>>>>
>>>>> On Aug 22, 2024, at 5:44 PM, Sbob <sbob(at)quadratum-braccas(dot)com> wrote:
>>>>>
>>>>> All;
>>>>>
>>>>> I am running a select from a partitioned table. The table (and all
>>>>> the partitions) have an index on contract_date like this:
>>>>> CREATE INDEX on part_tab (contract_date) where contract_date >
>>>>> '2022-01-01'::date
>>>>>
>>>>> The table (including all partitions) has 32million rows
>>>>> The db server is an aurora postgresql instance with 128GB of ram
>>>>> and 16 vcpu's
>>>>>
>>>>> The shared buffers is set to 90GB and effective_cache_size is also
>>>>> 90GB
>>>>> I set default_statistics_target to 1000 and ram a vacuum analyze
>>>>> on the table
>>>>>
>>>>> I am selecting a number of columns and specifying this where clause:
>>>>>
>>>>> WHERE (
>>>>>                      (contract_date IS NULL)
>>>>>                      OR
>>>>>                      (contract_date > '2022-01-01'::date)
>>>>>               )
>>>>>
>>>>> This takes 15 seconds to run and an explain says it's doing a
>>>>> table scan on all partitions (the query is not specifying the
>>>>> partition key)
>>>>> If I change the where clause to look like this:
>>>>>
>>>>> WHERE (
>>>>>                    (contract_date > '2022-01-01'::date)
>>>>>               )
>>>>>
>>>>> Then it performs index scans on all the partitions and runs in
>>>>> about 600ms
>>>>>
>>>>> If i leave the where clause off entirely it performs table scans
>>>>> of the partitions and takes approx 18 seconds to run
>>>>>
>>>>> I am trying to get the performance to less than 2sec,
>>>>> I have tried adding indexes on the table and all partitions like this:
>>>>> CREATE INDEX ON table (contract_date NULLS FIRST) ;
>>>>> but the performance with the full where clause is the same:
>>>>>
>>>>> WHERE (
>>>>>                      (contract_date IS NULL)
>>>>>                      OR
>>>>>                      (contract_date > '2022-01-01'::date)
>>>>>               )
>>>>>
>>>>> runs in 15 seconds and scans all partitions
>>>>>
>>>>> I also tried indexes i=on the table and all partitions like this:
>>>>> CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;
>>>>>
>>>>> but I get the same result, table scans on all partitions and it
>>>>> runs in 15 seconds
>>>>>
>>>>> Any help or advice ?
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>>
>>>> What is contract_date and when will it be null?
>>>
>>>
>>> it's a date data type and it allows NULL's not sure why, this is a
>>> client's system
>>>
>>>
>> 29 million of the 32 million rows in the table have NULL for
>> contract_date
>>
>
> NULLs are not indexed thus the OR predicate invalidate the use of the
> index.
>
> Since you are already creating a partial index just include the NULLs.
>  It index will get used for both of your queries.
>
> create index table_idx1
>   on table (contract_date)
>   where contract_date > ‘1/1/2022’
>       or contract_date is null
> ;
>
>
> The reason why I asked when is contract_date null is because
> attributes in a table should be non nullable.  If it’s nullable then
> that begs the question if it belong in that table in the first place;
> and sometimes the answer is yes.  I just see a lot of half baked
> schemas out there.  I refer to them as organically designed schemas.
>
> -Rui.
>
I agree, I will find out from the client

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Ribe 2024-08-23 15:17:19 Re: checking for a NULL date in a partitioned table kills performance
Previous Message Doug Reynolds 2024-08-23 11:39:05 Re: checking for a NULL date in a partitioned table kills performance

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Ribe 2024-08-23 15:17:19 Re: checking for a NULL date in a partitioned table kills performance
Previous Message Doug Reynolds 2024-08-23 11:39:05 Re: checking for a NULL date in a partitioned table kills performance