| From: | Rui DeSousa <rui(dot)desousa(at)icloud(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Sbob <sbob(at)quadratum-braccas(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: checking for a NULL date in a partitioned table kills performance |
| Date: | 2024-08-23 09:54:51 |
| Message-ID: | 9FBDD0F7-18D1-4AEE-996D-F0AF90707790@icloud.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-admin pgsql-performance |
> On Aug 22, 2024, at 8:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Sbob <sbob(at)quadratum-braccas(dot)com> writes:
>> 29 million of the 32 million rows in the table have NULL for contract_date
>
> [ blink... ] So your query is selecting at least 29/32nds of the
> table, plus however much matches the contract_date > '2022-01-01'
> alternative. I'm not sure how you expect that to be significantly
> cheaper than scanning the whole table.
>
> regards, tom lane
^^^ This is best answer; however, what is the actual query and how is it used? I assume it’s a analytical query and not actually extracting the rows. Is for a dashboard or an ad-hoc query?
Here’s simple example; from 4/1 is uses the index 3/1 it does a full table scan. Depending on what you using the query for you could use a covered index or a materialized view.
prod=# create index emp_idx3 on emp (contract_date);
CREATE INDEX
Time: 6036.030 ms (00:06.036)
prod=# select sum(site_id) from emp where contract_date > '4/1/2024';
sum
-----------
927473447
(1 row)
Time: 711.774 ms
prod=# select sum(site_id) from emp where contract_date > '3/1/2024';
sum
------------
1128971203
(1 row)
Time: 1945.397 ms (00:01.945)
prod=# select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
sum
------------
3823075309
(1 row)
Time: 1821.284 ms (00:01.821)
prod=# explain select sum(site_id) from emp where contract_date > '4/1/2024';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=754070.31..754070.32 rows=1 width=8)
-> Gather (cost=754069.59..754070.30 rows=7 width=8)
Workers Planned: 7
-> Partial Aggregate (cost=753069.59..753069.60 rows=1 width=8)
-> Parallel Bitmap Heap Scan on emp (cost=5343.20..752867.80 rows=80715 width=4)
Recheck Cond: (contract_date > '2024-04-01'::date)
-> Bitmap Index Scan on emp_idx3 (cost=0.00..5201.95 rows=565002 width=0)
Index Cond: (contract_date > '2024-04-01'::date)
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
(11 rows)
Time: 1.196 ms
prod=# explain select sum(site_id) from emp where contract_date > '3/1/2024';
QUERY PLAN
---------------------------------------------------------------------------------------
Finalize Aggregate (cost=764566.90..764566.91 rows=1 width=8)
-> Gather (cost=764566.18..764566.89 rows=7 width=8)
Workers Planned: 7
-> Partial Aggregate (cost=763566.18..763566.19 rows=1 width=8)
-> Parallel Seq Scan on emp (cost=0.00..763320.15 rows=98411 width=4)
Filter: (contract_date > '2024-03-01'::date)
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
(9 rows)
Time: 1.172 ms
prod=# drop index emp_idx3;
DROP INDEX
Time: 8.663 ms
prod=# create index emp_idx3 on emp (contract_date) include (site_id);
CREATE INDEX
Time: 7002.860 ms (00:07.003)
prod=# select sum(site_id) from emp where contract_date > '4/1/2024';
sum
-----------
927473447
(1 row)
Time: 56.450 ms
prod=# select sum(site_id) from emp where contract_date > '3/1/2024';
sum
------------
1128971203
(1 row)
Time: 49.115 ms
prod=# select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
sum
------------
3823075309
(1 row)
Time: 702.962 ms
prod=# explain select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=216035.47..216035.48 rows=1 width=8)
-> Gather (cost=216034.74..216035.45 rows=7 width=8)
Workers Planned: 7
-> Partial Aggregate (cost=215034.74..215034.75 rows=1 width=8)
-> Parallel Index Only Scan using emp_idx3 on emp (cost=0.44..213686.81 rows=539174 width=4)
Filter: ((contract_date > '2024-03-01'::date) OR (contract_date IS NULL))
JIT:
Functions: 5
Options: Inlining false, Optimization false, Expressions true, Deforming true
(9 rows)
Time: 0.972 ms
prod=# select count(*) from emp;
count
----------
16862243
(1 row)
Time: 629.995 ms
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Doug Reynolds | 2024-08-23 11:39:05 | Re: checking for a NULL date in a partitioned table kills performance |
| Previous Message | shammat | 2024-08-23 06:45:44 | Re: checking for a NULL date in a partitioned table kills performance |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Doug Reynolds | 2024-08-23 11:39:05 | Re: checking for a NULL date in a partitioned table kills performance |
| Previous Message | shammat | 2024-08-23 06:45:44 | Re: checking for a NULL date in a partitioned table kills performance |