Re: Is there any good optimization solution to improve the query efficiency?

From: Lorusso Domenico <domenico(dot)l76(at)gmail(dot)com>
To: gzh <gzhcoder(at)126(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Is there any good optimization solution to improve the query efficiency?
Date: 2023-06-06 21:13:03
Message-ID: CAJMpnG6DD2gXZ5KFC7Rk3RRQXOucYs945EJnq=6J3TnQoyPg3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm happy to help.
Looking the explanation the first with on t_res goes in parallel full table
scan... this is an issue.
Should be present an index on the temporale period (I'm just looking for
the same problem)

Il giorno mar 6 giu 2023 alle ore 10:33 gzh <gzhcoder(at)126(dot)com> ha scritto:

> I made some slight changes to the SQL you provided, but the optimization
> approach remained the same.
>
> I was surprised that the results were retrieved in less than one second.
> It's really impressive!
>
> Below is the execution plan. Thank you very much for providing the
> optimization method, I learned a lot from it.
>
>
> explain analyse
>
> with t_res as
>
> (select RSNO, KNO, CRSNO
>
> from tbl_res
>
> where tbl_res.CID >= to_date('2022/07/01', 'YYYY/MM/DD')
>
> and tbl_res.CID <= to_date('2022/07/31', 'YYYY/MM/DD')
>
> and tbl_res.COD >= to_date('2022/07/01', 'YYYY/MM/DD')
>
> and tbl_res.COD <= to_date('2022/07/31', 'YYYY/MM/DD')),
>
> t_pov2 as
>
> (select t_cust.RSNO, t_cust.KNO, MIN(t_cust.GSTSEQ) GSTSEQ
>
> from t_res -- this is tbl_res already filter by date
>
> inner join tbl_cust t_cust
>
> on t_res.RSNO = t_cust.RSNO
>
> inner join tbl_pov t_pov
>
> on t_pov.CRSNO = t_res.CRSNO -- why you use this table? it doesn't
> seem to be used to extract data. Are you trying to extract data from t_res
> that have at least a record in t_pov? in this case could work better move
> this join in the first with (using distinct or group by to ensure there
> will be just a record for RSNO and KNO)
>
> where t_cust.STSFLG = 'T'
>
> and t_cust.DISPSEQ <> 9999
>
> AND t_cust.KFIX = '0'
>
> group by t_cust.RSNO, t_cust.KNO),
>
> t_pov3 as
>
> (select t_cust.RSNO RSNO2, t_cust.KNO, t_cust.AGE, t_cust.GST
>
> from tbl_cust t_cust
>
> inner join t_pov2
>
> on t_pov2.RSNO = t_cust.RSNO
>
> and t_pov2.KNO = t_cust.KNO
>
> and t_pov2.GSTSEQ = t_cust.GSTSEQ)
>
> select *
>
> from t_res
>
> left outer join t_pov3
>
> on t_res.RSNO = t_pov3.RSNO2
>
> and t_res.KNO = t_pov3.KNO
>
>
> ----- execution plan -----
>
> Hash Right Join (cost=125923.21..132076.05 rows=472 width=164) (actual
> time=408.252..410.342 rows=15123 loops=1)
>
> Hash Cond: ((t_cust.RSNO = t_res.RSNO) AND ((t_cust.KNO)::text =
> (t_res.KNO)::text))
>
> CTE t_res
>
> -> Gather (cost=1000.00..58410.51 rows=472 width=27) (actual
> time=55.587..207.684 rows=15123 loops=1)
>
> Workers Planned: 2
>
> Workers Launched: 2
>
> -> Parallel Seq Scan on tbl_res (cost=0.00..57363.31 rows=197
> width=27) (actual time=49.850..204.235 rows=5041 loops=3)
>
> Filter: ((CID >= to_date('2022/07/01'::text,
> 'YYYY/MM/DD'::text)) AND (CID <= to_date('2022/07/31'::text,
> 'YYYY/MM/DD'::text)) AND (COD >= to_date('2022/07/01'::text,
> 'YYYY/MM/DD'::text)) AND (COD <= to_date('2022/07/31'::text,
> 'YYYY/MM/DD'::text)))
>
> Rows Removed by Filter: 161714
>
> -> Nested Loop (cost=67496.18..73648.88 rows=1 width=56) (actual
> time=191.880..191.924 rows=11 loops=1)
>
> -> GroupAggregate (cost=67495.75..67510.49 rows=737 width=50)
> (actual time=191.869..191.878 rows=11 loops=1)
>
> Group Key: t_cust_1.RSNO, t_cust_1.KNO
>
> -> Sort (cost=67495.75..67497.59 rows=737 width=23)
> (actual time=191.859..191.862 rows=13 loops=1)
>
> Sort Key: t_cust_1.RSNO, t_cust_1.KNO
>
> Sort Method: quicksort Memory: 26kB
>
> -> Nested Loop (cost=57118.88..67460.65 rows=737
> width=23) (actual time=172.185..191.837 rows=13 loops=1)
>
> -> Hash Join (cost=57118.45..58758.38 rows=472
> width=14) (actual time=172.154..191.647 rows=13 loops=1)
>
> Hash Cond: ((t_res_1.crsno)::text =
> (t_pov.crsno)::text)
>
> -> CTE Scan on t_res t_res_1
> (cost=0.00..9.44 rows=472 width=72) (actual time=0.003..1.445 rows=15123
> loops=1)
>
> -> Hash (cost=51380.09..51380.09
> rows=330109 width=9) (actual time=170.350..170.350 rows=330109 loops=1)
>
> Buckets: 131072 Batches: 8 Memory
> Usage: 2707kB
>
> -> Seq Scan on tbl_pov t_pov
> (cost=0.00..51380.09 rows=330109 width=9) (actual time=0.029..124.632
> rows=330109 loops=1)
>
> -> Index Scan using tbl_cust_pk on tbl_cust
> t_cust_1 (cost=0.43..18.42 rows=2 width=23) (actual time=0.011..0.012
> rows=1 loops=13)
>
> Index Cond: (RSNO = t_res_1.RSNO)
>
> Filter: ((dispseq <> '9999'::numeric) AND
> ((stsflg)::text = 'T'::text) AND ((KFIX)::text = '0'::text))
>
> Rows Removed by Filter: 2
>
> -> Index Scan using tbl_cust_pk on tbl_cust t_cust
> (cost=0.43..8.31 rows=1 width=61) (actual time=0.003..0.003 rows=1 loops=11)
>
> Index Cond: ((RSNO = t_cust_1.RSNO) AND (gstseq =
> (min(t_cust_1.gstseq))))
>
> Filter: ((t_cust_1.KNO)::text = (KNO)::text)
>
> -> Hash (cost=9.44..9.44 rows=472 width=108) (actual
> time=216.361..216.361 rows=15123 loops=1)
>
> Buckets: 16384 (originally 1024) Batches: 1 (originally 1)
> Memory Usage: 882kB
>
> -> CTE Scan on t_res (cost=0.00..9.44 rows=472 width=108)
> (actual time=55.591..211.698 rows=15123 loops=1)
>
> Planning Time: 1.417 ms
>
> Execution Time: 411.019 ms
>
>
> --------------------------------------------------------------------------------
>
>
>
>
>
>
> At 2023-06-05 22:53:56, "Lorusso Domenico" <domenico(dot)l76(at)gmail(dot)com> wrote:
>
> try this (there is some comment)
>
> with t_res as (
> select RSNO, KNO
> from TBL_RES
> where TBL_RES.CID >= to_date('2022/07/01', 'YYYY/MM/DD')
> and TBL_RES.CID <= to_date('2022/07/31', 'YYYY/MM/DD')
> and TBL_RES.COD >= to_date('2022/07/01', 'YYYY/MM/DD')
> and TBL_RES.COD <= to_date('2022/07/31', 'YYYY/MM/DD')
> ), t_pov2 as (
> select T_CUST.RSNO ,
> T_CUST.KNO ,
> MIN(T_CUST.GSTSEQ) GSTSEQ
> from T_RES -- this is tbl_res already filter by date
> inner join TBL_CUST T_CUST on T_RES.RSNO = T_CUST.RSNO
> inner join TBL_POV T_POV on T_POV.CRSNO = T_RES.CRSNO -- why you use this
> table? it doesn't seem to be used to extract data. Are you trying to
> extract data from T_RES that have at least a record in T_POV? in this case
> could work better move this join in the first with (using distinct or group
> by to ensure there will be just a record for rsno and kno)
> where T_CUST.STSFLG = 'T'
> and T_CUST.DISPSEQ <> 9999
> AND T_CUST.KFIX = '0'
> group by T_CUST.RSNO , T_CUST.KNO
> ), t_pov3 as (
> select T_CUST.RSNO RSNO2 ,
> T_CUST.KNO ,
> T_CUST.AGE ,
> T_CUST.GST
> from TBL_CUST T_CUST
> inner join t_pov2 on T_POV2.RSNO = T_CUST.RSNO
> and T_POV2.KNO = T_CUST.KNO
> and T_POV2.GSTSEQ = T_CUST.GSTSEQ
> )
> select *
> from TBL_RES
> left outer join t_pov3 on TBL_RES.RSNO = T_POV3.RSNO2
> and TBL_RES.KNO = T_POV3.KNO
>
> Il giorno lun 5 giu 2023 alle ore 12:06 gzh <gzhcoder(at)126(dot)com> ha scritto:
>
>> Thank you very much for taking the time to reply to my question.
>>
>> I followed your suggestion and rewrote the SQL using Common Table
>> Expression (CTE).
>>
>> Unfortunately, there was no significant improvement in performance.
>>
>>
>>
>> At 2023-06-05 17:47:25, "Lorusso Domenico" <domenico(dot)l76(at)gmail(dot)com>
>> wrote:
>>
>> Hello,
>> In many case a formal writing and usage of with statement could solve the
>> issue.
>> If you need join, use always join:
>> where T_POV2.RSNO = T_CUST.RSNO
>> and T_POV2.KNO = T_CUST.KNO
>> and T_POV2.GSTSEQ = T_CUST.GSTSEQ)
>> this is an inner join.
>>
>> I mean something like this
>> with t_pov2 as (
>> select T_CUST.RSNO ,
>> T_CUST.KNO ,
>> MIN(T_CUST.GSTSEQ) GSTSEQ
>> from TBL_CUST T_CUST ,
>> TBL_POV T_POV ,
>> TBL_RES T_RES
>> where T_CUST.STSFLG = 'T'
>> and T_CUST.DISPSEQ <> 9999
>> AND T_CUST.KFIX = '0'
>> and T_POV.CRSNO = T_RES.CRSNO
>> and T_RES.RSNO = T_CUST.RSNO
>> group by T_CUST.RSNO , T_CUST.KNO
>> ), t_pov3 as (
>> select T_CUST.RSNO RSNO2 ,
>> T_CUST.KNO ,
>> T_CUST.AGE ,
>> T_CUST.GST
>> from TBL_CUST T_CUST
>> inner join t_pov2 on T_POV2.RSNO = T_CUST.RSNO
>> and T_POV2.KNO = T_CUST.KNO
>> and T_POV2.GSTSEQ = T_CUST.GSTSEQ
>> )
>>
>> select *
>> from TBL_RES
>> left outer join t_pov3 on TBL_RES.RSNO = T_POV3.RSNO2
>> and TBL_RES.KNO = T_POV3.KNO
>> where TBL_RES.CID >= to_date('2022/07/01', 'YYYY/MM/DD')
>> and TBL_RES.CID <= to_date('2022/07/31', 'YYYY/MM/DD')
>> and TBL_RES.COD >= to_date('2022/07/01', 'YYYY/MM/DD')
>> and TBL_RES.COD <= to_date('2022/07/31', 'YYYY/MM/DD')
>>
>> but if tbl_res contain lessere record a good idea is start from
>> this table and use in join with other
>>
>>
>> Il giorno lun 5 giu 2023 alle ore 08:57 gzh <gzhcoder(at)126(dot)com> ha
>> scritto:
>>
>>> Hi everyone,
>>> I'm running into some performance issues with my SQL query.
>>> The following SQL query is taking a long time to execute.
>>>
>>> Execution Plan:
>>> explain analyse
>>> select * from TBL_RES
>>> left outer join(select T_CUST.RSNO RSNO2 ,
>>> T_CUST.KNO ,
>>> T_CUST.AGE ,
>>> T_CUST.GST
>>> from TBL_CUST T_CUST ,
>>> (select T_CUST.RSNO ,
>>> T_CUST.KNO ,
>>> MIN(T_CUST.GSTSEQ) GSTSEQ
>>> from TBL_CUST T_CUST ,
>>> TBL_POV T_POV ,
>>> TBL_RES T_RES
>>> where T_CUST.STSFLG = 'T'
>>> and T_CUST.DISPSEQ <> 9999
>>> AND T_CUST.KFIX = '0'
>>> and T_POV.CRSNO = T_RES.CRSNO
>>> and T_RES.RSNO = T_CUST.RSNO
>>> group by T_CUST.RSNO , T_CUST.KNO) T_POV2
>>> where T_POV2.RSNO = T_CUST.RSNO
>>> and T_POV2.KNO = T_CUST.KNO
>>> and T_POV2.GSTSEQ = T_CUST.GSTSEQ) T_POV3 on TBL_RES.RSNO = T_POV3.RSNO2
>>> and TBL_RES.KNO = T_POV3.KNO
>>> where TBL_RES.CID >= to_date('2022/07/01', 'YYYY/MM/DD')
>>> and TBL_RES.CID <= to_date('2022/07/31', 'YYYY/MM/DD')
>>> and TBL_RES.COD >= to_date('2022/07/01', 'YYYY/MM/DD')
>>> and TBL_RES.COD <= to_date('2022/07/31', 'YYYY/MM/DD')
>>> ----- Execution Plan -----
>>> Nested Loop Left Join (cost=254388.44..452544.70 rows=473 width=3545)
>>> (actual time=3077.312..996048.714 rows=15123 loops=1)
>>> Join Filter: ((TBL_RES.RSNO = T_CUST.RSNO) AND ((TBL_RES.KNO)::text =
>>> (T_CUST.KNO)::text))
>>> Rows Removed by Join Filter: 4992268642
>>> -> Gather (cost=1000.00..58424.35 rows=473 width=3489) (actual
>>> time=0.684..14.158 rows=15123 loops=1)
>>> Workers Planned: 2
>>> Workers Launched: 2
>>> -> Parallel Seq Scan on TBL_RES (cost=0.00..57377.05 rows=197
>>> width=3489) (actual time=0.096..279.504 rows=5041 loops=3)
>>> Filter: ((CID >= to_date('2022/07/01'::text,
>>> 'YYYY/MM/DD'::text)) AND (CID <= to_date('2022/07/31'::text,
>>> 'YYYY/MM/DD'::text)) AND (COD >= to_date('2022/07/01'::text,
>>> 'YYYY/MM/DD'::text)) AND (COD <= to_date('2022/07/31'::text,
>>> 'YYYY/MM/DD'::text)))
>>> Rows Removed by Filter: 161714
>>> -> Materialize (cost=253388.44..394112.08 rows=1 width=56) (actual
>>> time=0.081..26.426 rows=330111 loops=15123)
>>> -> Hash Join (cost=253388.44..394112.07 rows=1 width=56)
>>> (actual time=1197.484..2954.084 rows=330111 loops=1)
>>> Hash Cond: ((T_CUST.RSNO = T_CUST_1.RSNO) AND
>>> ((T_CUST.KNO)::text = (T_CUST_1.KNO)::text) AND (T_CUST.gstseq =
>>> (min(T_CUST_1.gstseq))))
>>> -> Seq Scan on TBL_CUST T_CUST (cost=0.00..79431.15
>>> rows=2000315 width=61) (actual time=0.015..561.005 rows=2000752 loops=1)
>>> -> Hash (cost=246230.90..246230.90 rows=262488 width=50)
>>> (actual time=1197.025..1209.957 rows=330111 loops=1)
>>> Buckets: 65536 Batches: 8 Memory Usage: 2773kB
>>> -> Finalize GroupAggregate
>>> (cost=205244.84..243606.02 rows=262488 width=50) (actual
>>> time=788.552..1116.074 rows=330111 loops=1)
>>> Group Key: T_CUST_1.RSNO, T_CUST_1.KNO
>>> -> Gather Merge (cost=205244.84..238964.80
>>> rows=268846 width=50) (actual time=788.547..982.479 rows=330111 loops=1)
>>> Workers Planned: 2
>>> Workers Launched: 1
>>> -> Partial GroupAggregate
>>> (cost=204244.81..206933.27 rows=134423 width=50) (actual
>>> time=784.032..900.979 rows=165056 loops=2)
>>> Group Key: T_CUST_1.RSNO,
>>> T_CUST_1.KNO
>>> -> Sort
>>> (cost=204244.81..204580.87 rows=134423 width=23) (actual
>>> time=784.019..833.791 rows=165061 loops=2)
>>> Sort Key: T_CUST_1.RSNO,
>>> T_CUST_1.KNO
>>> Sort Method: external merge
>>> Disk: 5480kB
>>> Worker 0: Sort Method:
>>> external merge Disk: 5520kB
>>> -> Parallel Hash Join
>>> (cost=111758.80..190036.38 rows=134423 width=23) (actual
>>> time=645.302..716.247 rows=165061 loops=2)
>>> Hash Cond:
>>> (T_CUST_1.RSNO = T_RES.RSNO)
>>> -> Parallel Seq Scan
>>> on TBL_CUST T_CUST_1 (cost=0.00..74013.63 rows=204760 width=23) (actual
>>> time=0.018..264.390 rows=165058 loops=2)
>>> Filter:
>>> ((dispseq <> '9999'::numeric) AND ((stsflg)::text = 'T'::text) AND
>>> ((KFIX)::text = '0'::text))
>>> Rows Removed by
>>> Filter: 835318
>>> -> Parallel Hash
>>> (cost=109508.52..109508.52 rows=137142 width=8) (actual
>>> time=343.593..343.896 rows=165058 loops=2)
>>> Buckets: 131072
>>> Batches: 8 Memory Usage: 3008kB
>>> -> Parallel
>>> Hash Join (cost=51834.70..109508.52 rows=137142 width=8) (actual
>>> time=256.732..314.368 rows=165058 loops=2)
>>> Hash Cond:
>>> ((T_RES.crsno)::text = (T_POV.crsno)::text)
>>> ->
>>> Parallel Seq Scan on TBL_RES T_RES (cost=0.00..53199.02 rows=208902
>>> width=17) (actual time=0.007..100.510 rows=250132 loops=2)
>>> ->
>>> Parallel Hash (cost=49450.42..49450.42 rows=137142 width=9) (actual
>>> time=122.308..122.309 rows=165054 loops=2)
>>>
>>> Buckets: 131072 Batches: 8 Memory Usage: 2976kB
>>> ->
>>> Parallel Seq Scan on TBL_POV T_POV (cost=0.00..49450.42 rows=137142
>>> width=9) (actual time=0.037..89.470 rows=165054 loops=2)
>>> Planning Time: 1.064 ms
>>> Execution Time: 996062.382 ms
>>>
>>> --------------------------------------------------------------------------------
>>>
>>> The amount of data in the table is as follows.
>>> TBL_RES 500265
>>> TBL_CUST 2000752
>>> TBL_POV 330109
>>>
>>> Any suggestions for improving the performance of the query would be
>>> greatly appreciated.
>>>
>>> Thanks in advance!
>>>
>>>
>>
>> --
>> Domenico L.
>>
>> per stupire mezz'ora basta un libro di storia,
>> io cercai di imparare la Treccani a memoria... [F.d.A.]
>>
>>
>
> --
> Domenico L.
>
> per stupire mezz'ora basta un libro di storia,
> io cercai di imparare la Treccani a memoria... [F.d.A.]
>
>

--
Domenico L.

per stupire mezz'ora basta un libro di storia,
io cercai di imparare la Treccani a memoria... [F.d.A.]

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lucas Possamai 2023-06-07 01:13:30 PostgreSQL Dependency tree
Previous Message Stephanie Goulet 2023-06-06 17:36:01 Fwd: No prompt for setting up a master password