Re: Why is a sort required for this query? (IS NULL predicate on leading key column)

From: Jerry Brenner <jbrenner(at)guidewire(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Why is a sort required for this query? (IS NULL predicate on leading key column)
Date: 2024-01-17 14:48:54
Message-ID: CACoKFYRL66sx14TknSO2OP4BO4Dp=wMNE=EQzVYqbppFBaqUBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Apologies for not including this in the original email. The other
index, job_u_createtime_2cy0wgyqpani8,
is on pc_job(CreateTime, Retired, Subtype, ID). The optimizer chooses
Nested Loop when choosing that index, vs Hash Join when choosing the index
in the first plan that I posted. It seems like the choice of the Hash Join
in the 1st plan that I posted is collateral damage from the seemingly
unnecessary need to do the sort.

Here's the plan without forcing the index:

Limit (cost=1.00..52692.73 rows=10 width=20) (actual
time=55219.289..87704.704 rows=10 loops=1)
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Nested Loop (cost=1.00..2007555.82 rows=381 width=20) (actual
time=55219.288..87704.695 rows=10 loops=1)
Buffers: shared hit=9579294 read=328583
I/O Timings: read=1157740.299
-> Index Scan using job_u_createtime_2cy0wgyqpani8 on pc_job groot
(cost=0.56..1800117.94 rows=153696 width=28) (actual
time=102.075..79470.670 rows=5650 loops=1)
Index Cond: ((groot.retired = 0) AND (groot.subtype = 7))
Filter: (groot.closedate IS NULL)
Rows Removed by Filter: 14994857
Buffers: shared hit=9563981 read=321566
I/O Timings: read=1149579.949
-> Index Scan using pc_policy_pk on pc_policy policy_0
(cost=0.43..1.35 rows=1 width=8) (actual time=1.456..1.456 rows=0
loops=5650)
Index Cond: (policy_0.id = groot.policyid)
Filter: ((policy_0.retired = 0) AND
(policy_0.producercodeofserviceid = ANY
('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])))
Rows Removed by Filter: 1
Buffers: shared hit=15313 read=7017
I/O Timings: read=8160.350
Planning time: 2.209 ms
Execution time: 87705.116 ms

Thanks,
Jerry

On Wed, Jan 17, 2024 at 6:39 AM Jerry Brenner <jbrenner(at)guidewire(dot)com>
wrote:

> We are on 13.9.
> I'm wondering why a sort is required for this query, as the index should
> be providing the required ordering to satisfy the ORDER BY clause. Does it
> have to do with the IS NULL predicate on the leading key column in the
> index?
>
> There's an index, job_u_closedate_g9cdc6ghupib, on pc_job(CloseDate,
> Retired, Subtype, CreateTime, ID). All columns have ASC sort order and
> NULLs LAST.
>
> - pc_job is the probe table in a hash join
> - There are IS NULL and equality predicates on the 3 leading columns
> in the index and the last 2 key columns (CreateTime, ID) are the ordering
> columns in the query
> - So, the Index Scan of job_u_closedate_g9cdc6ghupib is returning the
> rows in the sorted order
> - NOTE: The sort is cheap, but I'm investigating this because
> "CloseDate IS NULL" is very selective and without forcing the index the
> optimizer is choosing a different sort avert index that does not include
> CloseDate and hence a lot of time is spent filtering out rows on that
> predicate against the heap.
>
> Here's the query
>
> SELECT /* ISNULL:pc_job.CloseDate:, KeyTable:pc_job; */ gRoot.ID col0,
> gRoot.Subtype col1, gRoot.CreateTime col2
> FROM pc_job gRoot INNER JOIN pc_policy policy_0
> ON policy_0.ID = gRoot.PolicyID
> WHERE gRoot.Subtype = 7 AND gRoot.CloseDate IS NULL
> AND gRoot.Retired = 0
> AND policy_0.ProducerCodeOfServiceID IN
>
> (248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719)
> AND policy_0.Retired = 0
> ORDER BY col2 ASC, col0 ASC LIMIT 10
>
> Here's the query plan:
>
> Limit (cost=107826.77..107826.79 rows=10 width=20) (actual time=13149.872..13149.877 rows=10 loops=1)
> Buffers: shared hit=2756 read=40121
> I/O Timings: read=105917.908
> -> Sort (cost=107826.77..107827.72 rows=381 width=20) (actual time=13149.871..13149.874 rows=10 loops=1)
> Sort Key: groot.createtime, groot.id
> Sort Method: top-N heapsort Memory: 25kB
> Buffers: shared hit=2756 read=40121
> I/O Timings: read=105917.908
> -> Hash Join (cost=15632.51..107818.53 rows=381 width=20) (actual time=578.511..13149.658 rows=144 loops=1)
> Buffers: shared hit=2750 read=40121
> I/O Timings: read=105917.908
> -> Index Scan using job_u_closedate_g9cdc6ghupib on pc_job groot (cost=0.56..91783.14 rows=153696 width=28) (actual time=3.864..12562.568 rows=75558 loops=1)
> Index Cond: ((groot.closedate IS NULL) AND (groot.retired = 0) AND (groot.subtype = 7))
> Buffers: shared hit=2721 read=27934
> I/O Timings: read=58781.220
> -> Hash (cost=15427.92..15427.92 rows=16322 width=8) (actual time=543.298..543.299 rows=13016 loops=1)
> Buffers: shared hit=29 read=12187
> I/O Timings: read=47136.688
> -> Index Scan using policy_n_producerco_3e8i0ojsyckhx on pc_policy policy_0 (cost=0.43..15427.92 rows=16322 width=8) (actual time=6.149..540.501 rows=13016 loops=1)
> Index Cond: ((policy_0.producercodeofserviceid = ANY ('{248,1092,1848,74101,103158,103159,117402,122618,129215,132420,135261,137719}'::bigint[])) AND (policy_0.retired = 0))
> Buffers: shared hit=29 read=12187
> I/O Timings: read=47136.688
> Planning time: 0.538 ms
> Execution time: 13150.301 ms
>
> Thanks,
>
> Jerry
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2024-01-17 15:11:17 Re: Why is a sort required for this query? (IS NULL predicate on leading key column)
Previous Message Jerry Brenner 2024-01-17 14:39:06 Why is a sort required for this query? (IS NULL predicate on leading key column)