Why is a sort required for this query? (IS NULL predicate on leading key column)

From: Jerry Brenner <jbrenner(at)guidewire(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Why is a sort required for this query? (IS NULL predicate on leading key column)
Date: 2024-01-17 14:39:06
Message-ID: CACoKFYRfJjpaG+yXAZXdTm7b7UAk4UQwV3-b2H_nmYUMS1TBgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We are on 13.9.
I'm wondering why a sort is required for this query, as the index should be
providing the required ordering to satisfy the ORDER BY clause. Does it
have to do with the IS NULL predicate on the leading key column in the
index?

There's an index, job_u_closedate_g9cdc6ghupib, on pc_job(CloseDate,
Retired, Subtype, CreateTime, ID). All columns have ASC sort order and
NULLs LAST.

- pc_job is the probe table in a hash join
- There are IS NULL and equality predicates on the 3 leading columns in
the index and the last 2 key columns (CreateTime, ID) are the ordering
columns in the query
- So, the Index Scan of job_u_closedate_g9cdc6ghupib is returning the
rows in the sorted order
- NOTE: The sort is cheap, but I'm investigating this because "CloseDate
IS NULL" is very selective and without forcing the index the optimizer is
choosing a different sort avert index that does not include CloseDate and
hence a lot of time is spent filtering out rows on that predicate against
the heap.

Here's the query

SELECT /* ISNULL:pc_job.CloseDate:, KeyTable:pc_job; */ gRoot.ID col0,
gRoot.Subtype col1, gRoot.CreateTime col2
FROM pc_job gRoot INNER JOIN pc_policy policy_0
ON policy_0.ID = gRoot.PolicyID
WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULL
AND gRoot.Retired = 0
AND policy_0.ProducerCodeOfServiceID IN

(248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719)
AND policy_0.Retired = 0
ORDER BY col2 ASC, col0 ASC LIMIT 10

Here's the query plan:

Limit (cost=107826.77..107826.79 rows=10 width=20) (actual
time=13149.872..13149.877 rows=10 loops=1)
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Sort (cost=107826.77..107827.72 rows=381 width=20) (actual
time=13149.871..13149.874 rows=10 loops=1)
Sort Key: groot.createtime, groot.id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=2756 read=40121
I/O Timings: read=105917.908
-> Hash Join (cost=15632.51..107818.53 rows=381 width=20)
(actual time=578.511..13149.658 rows=144 loops=1)
Buffers: shared hit=2750 read=40121
I/O Timings: read=105917.908
-> Index Scan using job_u_closedate_g9cdc6ghupib on
pc_job groot (cost=0.56..91783.14 rows=153696 width=28) (actual
time=3.864..12562.568 rows=75558 loops=1)
Index Cond: ((groot.closedate IS NULL) AND
(groot.retired = 0) AND (groot.subtype = 7))
Buffers: shared hit=2721 read=27934
I/O Timings: read=58781.220
-> Hash (cost=15427.92..15427.92 rows=16322 width=8)
(actual time=543.298..543.299 rows=13016 loops=1)
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
-> Index Scan using
policy_n_producerco_3e8i0ojsyckhx on pc_policy policy_0
(cost=0.43..15427.92 rows=16322 width=8) (actual time=6.149..540.501
rows=13016 loops=1)
Index Cond:
((policy_0.producercodeofserviceid = ANY
('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[]))
AND (policy_0.retired = 0))
Buffers: shared hit=29 read=12187
I/O Timings: read=47136.688
Planning time: 0.538 ms
Execution time: 13150.301 ms

Thanks,

Jerry

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jerry Brenner 2024-01-17 14:48:54 Re: Why is a sort required for this query? (IS NULL predicate on leading key column)
Previous Message Clemens Eisserer 2024-01-11 17:48:35 Re: Selection not "pushed down into" CTE