Re: checking for a NULL date in a partitioned table kills performance

From: Rui DeSousa <rui(dot)desousa(at)icloud(dot)com>
To: Sbob <sbob(at)quadratum-braccas(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: checking for a NULL date in a partitioned table kills performance
Date: 2024-08-23 00:07:55
Message-ID: 7D59CDD1-4B8F-460F-9C2B-3DCE9FA7FBB5@icloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

> On Aug 22, 2024, at 7:32 PM, Sbob <sbob(at)quadratum-braccas(dot)com> wrote:
>
>
> On 8/22/24 5:26 PM, Sbob wrote:
>>
>> On 8/22/24 5:06 PM, Rui DeSousa wrote:
>>>
>>>> On Aug 22, 2024, at 5:44 PM, Sbob <sbob(at)quadratum-braccas(dot)com> wrote:
>>>>
>>>> All;
>>>>
>>>> I am running a select from a partitioned table. The table (and all the partitions) have an index on contract_date like this:
>>>> CREATE INDEX on part_tab (contract_date) where contract_date > '2022-01-01'::date
>>>>
>>>> The table (including all partitions) has 32million rows
>>>> The db server is an aurora postgresql instance with 128GB of ram and 16 vcpu's
>>>>
>>>> The shared buffers is set to 90GB and effective_cache_size is also 90GB
>>>> I set default_statistics_target to 1000 and ram a vacuum analyze on the table
>>>>
>>>> I am selecting a number of columns and specifying this where clause:
>>>>
>>>> WHERE (
>>>> (contract_date IS NULL)
>>>> OR
>>>> (contract_date > '2022-01-01'::date)
>>>> )
>>>>
>>>> This takes 15 seconds to run and an explain says it's doing a table scan on all partitions (the query is not specifying the partition key)
>>>> If I change the where clause to look like this:
>>>>
>>>> WHERE (
>>>> (contract_date > '2022-01-01'::date)
>>>> )
>>>>
>>>> Then it performs index scans on all the partitions and runs in about 600ms
>>>>
>>>> If i leave the where clause off entirely it performs table scans of the partitions and takes approx 18 seconds to run
>>>>
>>>> I am trying to get the performance to less than 2sec,
>>>> I have tried adding indexes on the table and all partitions like this:
>>>> CREATE INDEX ON table (contract_date NULLS FIRST) ;
>>>> but the performance with the full where clause is the same:
>>>>
>>>> WHERE (
>>>> (contract_date IS NULL)
>>>> OR
>>>> (contract_date > '2022-01-01'::date)
>>>> )
>>>>
>>>> runs in 15 seconds and scans all partitions
>>>>
>>>> I also tried indexes i=on the table and all partitions like this:
>>>> CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;
>>>>
>>>> but I get the same result, table scans on all partitions and it runs in 15 seconds
>>>>
>>>> Any help or advice ?
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>> What is contract_date and when will it be null?
>>
>>
>> it's a date data type and it allows NULL's not sure why, this is a client's system
>>
>>
> 29 million of the 32 million rows in the table have NULL for contract_date
>

NULLs are not indexed thus the OR predicate invalidate the use of the index.

Since you are already creating a partial index just include the NULLs. It index will get used for both of your queries.

create index table_idx1
on table (contract_date)
where contract_date > ‘1/1/2022’
or contract_date is null
;

The reason why I asked when is contract_date null is because attributes in a table should be non nullable. If it’s nullable then that begs the question if it belong in that table in the first place; and sometimes the answer is yes. I just see a lot of half baked schemas out there. I refer to them as organically designed schemas.

-Rui.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Muhammad Usman Khan 2024-08-23 04:40:28 Re: checking for a NULL date in a partitioned table kills performance
Previous Message Tom Lane 2024-08-23 00:05:19 Re: checking for a NULL date in a partitioned table kills performance

Browse pgsql-performance by date

  From Date Subject
Next Message Muhammad Usman Khan 2024-08-23 04:40:28 Re: checking for a NULL date in a partitioned table kills performance
Previous Message Tom Lane 2024-08-23 00:05:19 Re: checking for a NULL date in a partitioned table kills performance