Re: [EXT] YNT: Need help tuning a query

From: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "Dirschel, Steve" <steve(dot)dirschel(at)thomsonreuters(dot)com>
Cc: Mehmet Çakoğlu <mehmetcakoglu(at)gmail(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>, "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
Subject: Re: [EXT] YNT: Need help tuning a query
Date: 2023-09-29 06:33:21
Message-ID: CAB=Je-G7j5yaXdAOe6pxzMsVf-Mcxg77tfWBQM=9eV=rDJ6yRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Steve,

It looks like PostgreSQL was not able to push the join condition into group
by subquery.

The problematic bits in the PostgreSQL plan are

-> Nested Loop Left Join (cost=3423317.32..4522882.01 rows=4
width=261) (actual time=30839.132..35117.682 rows=39 loops=1)
Join Filter: (sub2.judge_id = judg1.judge_id)
Rows Removed by Join Filter: 858300

-> Nested Loop Left Join (cost=1504656.29..2084303.10
rows=2 width=205) (actual time=19156.871..21775.159 rows=3 loops=1)
Join Filter: (sub4.judge_id = judg1.judge_id)
Rows Removed by Join Filter: 55263

In other words, it did not use sub2.judge_id = judg1.judge_id condition for
efficient data retrieval, and it selected all the data and then filtered.

I believe you could workaround the issue by using LATERAL subqueries, so
you could manually push join conditions into the group by subqueries.
See
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL

Frankly speaking, it looks strange that you have "group by .. limit 11" at
the very end of the query.
There's no "order by", so it effectively means "return 11 random rows".
I would suggest adding order by if the order is important, or remove "limit
11" and apply the same "lateral" trick for the third subquery as well.
Technically speaking, you can have limit within lateral.

----

I would suggest trying the following:

--- original.sql 2023-09-29 09:20:09
+++ tuned.sql 2023-09-29 09:25:35
@@ -83,6 +83,7 @@
AND (judg1.jrt_opt_out_flag <> 'Y' OR
judg1.jrt_opt_out_flag IS NULL)
) sub0
LEFT OUTER JOIN
+ LATERAL
( SELECT sub4.case_year_number,
sub4.judge_wld_id,
sub4.judge_id,
@@ -94,6 +95,7 @@
'A' AS grouping_flg,
jrtf1.case_document_id AS subcount
FROM wln_mart.jrt_fact jrtf1
+ WHERE jrtf1.judge_id = sub0.judge_id
GROUP BY jrtf1.case_year_number,
jrtf1.higher_judge_wld_id,
jrtf1.higher_judge_id,
@@ -106,6 +108,7 @@
) sub1
ON sub1.judge_id = sub0.judge_id
LEFT OUTER JOIN
+ LATERAL
(SELECT sub5.case_year_number,
sub5.judge_wld_id,
sub5.judge_id,
@@ -121,6 +124,7 @@
FROM wln_mart.jrt_fact jrtf2
WHERE jrtf2.lower_judge_court_level_id > 1000
AND jrtf2.lower_judge_court_level_id <= 1004
+ AND jrtf2.judge_id = sub0.judge_id
GROUP BY jrtf2.case_year_number,
jrtf2.lower_judge_wld_id,
jrtf2.lower_judge_id,

Could you please share explain (analyze, buffers) for the tuned query?

SELECT agg_sub.judge_id,
agg_sub.display_name,
agg_sub.active_flag,
agg_sub.judge_court_level,
agg_sub.jrt_fact_first_year_trial,
agg_sub.jrt_fact_last_year_trial,
agg_sub.jrt_fact_totalcount_trial,
agg_sub.filtered_first_year_trial,
agg_sub.filtered_last_year_trial,
agg_sub.jrt_fact_count_trial,
agg_sub.jrt_fact_first_year_appeal,
agg_sub.jrt_fact_last_year_appeal,
agg_sub.jrt_fact_totalcount_appeal,
agg_sub.filtered_first_year_appeal,
agg_sub.filtered_last_year_appeal,
agg_sub.jrt_fact_count_appeal,
appellate_flag_sub.appellate_flag
FROM (SELECT sub3.judge_id,
sub3.display_name,
sub3.active_flag,
sub3.judge_court_level,
(MIN(sub3.trial_unfilt_case_year_number)) AS
jrt_fact_first_year_trial,
(MAX(sub3.trial_unfilt_case_year_number)) AS
jrt_fact_last_year_trial,
(SUM(sub3.trial_unfilt_subcount)) AS jrt_fact_totalcount_trial,
(MIN(sub3.trial_filt_case_year_number)) AS
filtered_first_year_trial,
(MAX(sub3.trial_filt_case_year_number)) AS
filtered_last_year_trial,
(SUM(sub3.trial_filt_subcount)) AS jrt_fact_count_trial,
(MIN(sub3.appeal_unfilt_case_year_number)) AS
jrt_fact_first_year_appeal,
(MAX(sub3.appeal_unfilt_case_year_number)) AS
jrt_fact_last_year_appeal,
(SUM(sub3.appeal_unfilt_subcount)) AS
jrt_fact_totalcount_appeal,
(MIN(sub3.appeal_filt_case_year_number)) AS
filtered_first_year_appeal,
(MAX(sub3.appeal_filt_case_year_number)) AS
filtered_last_year_appeal,
(SUM(sub3.appeal_filt_subcount)) AS jrt_fact_count_appeal
FROM (SELECT sub0.judge_id,
sub0.display_name,
sub0.active_flag,
sub0.judge_court_level,
(CASE WHEN sub2.grouping_flg = 'T' AND
sub2.judge_wld_id = sub0.judge_wld_id THEN sub2.case_year_number ELSE NULL
END) AS trial_unfilt_case_year_number,
(CASE WHEN sub2.grouping_flg = 'T'AND sub2.judge_wld_id
= sub0.judge_wld_id THEN sub2.subcount ELSE NULL END
) AS trial_unfilt_subcount,
(CASE WHEN sub2.grouping_flg = 'T' AND
sub2.judge_wld_id = sub0.judge_wld_id THEN (CASE WHEN sub2.case_year_number
BETWEEN sub0.low_case_year_number AND sub0.high_case_year_number
THEN sub2.case_year_number ELSE NULL END) ELSE
NULL END) AS trial_filt_case_year_number,
(CASE WHEN sub2.grouping_flg = 'T' AND
sub2.judge_wld_id = sub0.judge_wld_id THEN (CASE WHEN sub2.case_year_number
BETWEEN sub0.low_case_year_number AND sub0.high_case_year_number
THEN sub2.subcount ELSE NULL END ) ELSE
NULL END ) AS trial_filt_subcount,
(CASE WHEN sub1.grouping_flg = 'A'AND sub1.judge_wld_id
= sub0.judge_wld_id THEN sub1.case_year_number WHEN sub2.grouping_flg = 'A'
AND sub2.judge_wld_id = sub0.judge_wld_id
THEN sub2.case_year_number ELSE NULL END
) AS appeal_unfilt_case_year_number,
(
CASE WHEN sub1.grouping_flg = 'A'
AND sub1.judge_wld_id = sub0.judge_wld_id THEN
sub1.subcount WHEN sub2.grouping_flg = 'A'
AND sub2.judge_wld_id = sub0.judge_wld_id THEN
sub2.subcount ELSE NULL END
) AS appeal_unfilt_subcount,
(
CASE WHEN sub1.grouping_flg = 'A'
AND sub1.judge_wld_id = sub0.judge_wld_id THEN (
CASE WHEN sub1.case_year_number BETWEEN
sub0.low_case_year_number
AND sub0.high_case_year_number THEN
sub1.case_year_number ELSE NULL END
) WHEN sub2.grouping_flg = 'A'
AND sub2.judge_wld_id = sub0.judge_wld_id THEN (
CASE WHEN sub2.case_year_number BETWEEN
sub0.low_case_year_number
AND sub0.high_case_year_number THEN
sub2.case_year_number ELSE NULL END
) ELSE NULL END
) AS appeal_filt_case_year_number,
(
CASE WHEN sub1.grouping_flg = 'A'
AND sub1.judge_wld_id = sub0.judge_wld_id THEN (
CASE WHEN sub1.case_year_number BETWEEN
sub0.low_case_year_number
AND sub0.high_case_year_number THEN
sub1.subcount ELSE NULL END
) WHEN sub2.grouping_flg = 'A'
AND sub2.judge_wld_id = sub0.judge_wld_id THEN (
CASE WHEN sub2.case_year_number BETWEEN
sub0.low_case_year_number
AND sub0.high_case_year_number THEN
sub2.subcount ELSE NULL END
) ELSE NULL END
) AS appeal_filt_subcount
FROM ( SELECT 0104119201 AS judge_wld_id,
2013 AS low_case_year_number,
2023 AS high_case_year_number,
judg1.judge_id,
judg1.display_name,
judg1.active_flag,
judg1.judge_court_level
FROM wln_mart.judge judg1
WHERE judg1.wld_id = 01041192
AND judg1.profile_id = 01
AND (judg1.jrt_opt_out_flag <> 'Y' OR
judg1.jrt_opt_out_flag IS NULL)
) sub0
LEFT OUTER JOIN
LATERAL
( SELECT sub4.case_year_number,
sub4.judge_wld_id,
sub4.judge_id,
sub4.grouping_flg,
COUNT(*) AS subcount
FROM (SELECT jrtf1.case_year_number,
jrtf1.higher_judge_wld_id AS judge_wld_id,
jrtf1.higher_judge_id AS judge_id,
'A' AS grouping_flg,
jrtf1.case_document_id AS subcount
FROM wln_mart.jrt_fact jrtf1
WHERE jrtf1.judge_id = sub0.judge_id
GROUP BY jrtf1.case_year_number,
jrtf1.higher_judge_wld_id,
jrtf1.higher_judge_id,
jrtf1.case_document_id
) sub4
GROUP BY sub4.case_year_number,
sub4.judge_wld_id,
sub4.judge_id,
sub4.grouping_flg
) sub1
ON sub1.judge_id = sub0.judge_id
LEFT OUTER JOIN
LATERAL
(SELECT sub5.case_year_number,
sub5.judge_wld_id,
sub5.judge_id,
sub5.grouping_flg,
COUNT(*) AS subcount
FROM (SELECT jrtf2.case_year_number,
jrtf2.lower_judge_wld_id AS judge_wld_id,
jrtf2.lower_judge_id AS judge_id,
(
CASE WHEN
jrtf2.lower_judge_court_level_id = 1004 THEN 'T' ELSE 'A' END
) AS grouping_flg,
jrtf2.case_document_id AS subcount
FROM wln_mart.jrt_fact jrtf2
WHERE jrtf2.lower_judge_court_level_id > 1000
AND jrtf2.lower_judge_court_level_id <= 1004
AND jrtf2.judge_id = sub0.judge_id
GROUP BY jrtf2.case_year_number,
jrtf2.lower_judge_wld_id,
jrtf2.lower_judge_id,
(
CASE WHEN
jrtf2.lower_judge_court_level_id = 1004 THEN 'T' ELSE 'A' END
),
jrtf2.case_document_id
) sub5
GROUP BY sub5.case_year_number,
sub5.judge_wld_id,
sub5.judge_id,
sub5.grouping_flg
) sub2
ON sub2.judge_id = sub0.judge_id
) sub3
GROUP BY sub3.judge_id,
sub3.display_name,
sub3.active_flag,
sub3.judge_court_level
) agg_sub,
(SELECT judge_id,
(CASE WHEN (SUM (appellate_flag) < (COUNT(*) / 2)) THEN 0 ELSE 1
END ) AS appellate_flag
FROM (SELECT DISTINCT jrtf.case_year_number,
jrtf.case_document_id,
jrtf.lower_judge_id,
jrtf.higher_judge_id,
(
CASE WHEN (
lower_judge_wld_id = 0104119201
AND lower_judge_court_level_id = 1004
) THEN 0 ELSE 1 END ) AS appellate_flag,
(
CASE WHEN lower_judge_wld_id = 0104119201 THEN
lower_judge_id ELSE higher_judge_id END
) AS judge_id
FROM wln_mart.JRT_FACT jrtf
WHERE LOWER_JUDGE_WLD_ID = 0104119201
OR HIGHER_JUDGE_WLD_ID = 0104119201
ORDER BY case_year_number DESC
) sub0
GROUP BY judge_id
LIMIT 11
) appellate_flag_sub
WHERE
Agg_sub.judge_id = appellate_flag_sub.judge_id;

Vladimir

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vladimir Sitnikov 2023-09-29 06:46:46 Re: [EXT] YNT: Need help tuning a query
Previous Message Raivo Rebane 2023-09-29 05:23:13 Re: Right version of jdbc