Re: CPU Consuming query. Sequential scan despite indexing.

From: aditya desai <admad123(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU Consuming query. Sequential scan despite indexing.
Date: 2020-10-20 12:30:44
Message-ID: CAN0SRDGvmdEYYzQQjq2+HUwpMd5f=ThrjO3CHZ=iCZv0Mjs59w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Laurenz,
I created

On Fri, Oct 16, 2020 at 2:06 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:

> On Thu, 2020-10-15 at 20:34 +0530, aditya desai wrote:
> > Below query always shows up on top in the CPU matrix. Also despite
> having indexes it does sequential scans
> > (probably because WHERE condition satisfies almost all of the data from
> table). This query
> > runs on the default landing page in application and needs to fetch
> records in less that 100 ms
> > without consuming too much CPU.
> >
> > Any opinions? Table is very huge and due to referential identity and
> business requirements we could not
> > implement partitioning as well.
> >
> > There is index on (countrycode,facilitycode,jobstartdatetime)
> >
> > explain (analyze,buffers) with JobCount as ( select
> jobstatuscode,count(1) stat_count from job j
> > where 1=1 and j.countrycode = 'TH'
> > and j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> > and ((j.jobstartdatetime between '2020-08-01 00:00:00' and '2020-09-30
> 00:00:00' ) or j.jobstartdatetime IS NULL ) group by j.jobstatuscode)
> > select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount
> jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
> >
> > QUERY PLAN
> >
> > Hash Right Join (cost=98845.93..98846.10 rows=10 width=12) (actual
> time=1314.809..1314.849 rows=10 loops=1)
> > -> Parallel Seq Scan on job j
> (cost=0.00..96837.93 rows=200963 width=4) (actual time=13.010..1144.434
> rows=163200 loops=3)
> > Filter: (((countrycode)::text =
> 'TH'::text) AND (((jobstartdatetime >= '2020-08-01 00:00:00'::timestamp
> without time zone) AND (jobst
> > artdatetime <= '2020-09-30 00:00:00'::timestamp without time zone)) OR
> (jobstartdatetime IS NULL)) AND ((facilitycode)::text = ANY
> ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1
> > ,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
> > Rows Removed by Filter: 449035
> > Buffers: shared hit=60086 read=11834
> > I/O Timings: read=59.194
> >
>
> You should rewrite the subquery as a UNION to avoid the OR:
>
> ... WHERE j.countrycode = 'TH'
> and j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime between '2020-08-01 00:00:00' and
> '2020-09-30 00:00:00'
>
> and
>
> ... WHERE j.countrycode = 'TH'
> and j.facilitycode in
> ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
> and j.jobstartdatetime IS NULL
>
> These indexes could speed up the resulting query:
>
> CREATE INDEX ON job (countrycode, facilitycode);
> CREATE INDEX ON job (countrycode, jobstartdatetime);
> CREATE INDEX ON job (countrycode, facilitycode) WHERE jobstartdaytime IS
> NULL;
>

I created the indexes you suggested and changed the query with the UNION
operator. Please see explain plan below. Performance of the query(execution
time has improved mostly because I ran vacuum full). Cost of the query is
still high.This is Dev envrionment and has 2 vCPU and 8 GB RAM.

explain (analyze,buffers) with JobCount as ( (select jobstatuscode,count(1)
stat_count from job j where 1=1 and j.countrycode = 'TH' and
j.facilitycode in
('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
and j.jobstartdatetime between '2020-08-01 00:00:00' and '2020-09-30
00:00:00' group by j.jobstatuscode) UNION (select jobstatuscode,count(1)
stat_count from job j where 1=1 and j.countrycode = 'TH' and
j.facilitycode in
('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
and j.jobstartdatetime is null group by j.jobstatuscode))
lmp_delivery_jobs-> select js.jobstatuscode,COALESCE(stat_count,0)
stat_count from JobCount jc right outer join jobstatus js on
jc.jobstatuscode=js.jobstatuscode;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Hash Right Join (cost=79010.89..79011.19 rows=10 width=12) (actual
time=444.241..444.256 rows=10 loops=1)
Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
Buffers: shared hit=8560
CTE jobcount
-> HashAggregate (cost=79002.35..79002.48 rows=13 width=24) (actual
time=444.211..444.213 rows=6 loops=1)
Group Key: j.jobstatuscode, (count(1))
Buffers: shared hit=8558
-> Append (cost=78959.64..79002.28 rows=13 width=24) (actual
time=444.081..444.202 rows=6 loops=1)
Buffers: shared hit=8558
-> Finalize GroupAggregate (cost=78959.64..78961.41
rows=7 width=12) (actual time=444.079..444.101 rows=6 loops=1)
Group Key: j.jobstatuscode
Buffers: shared hit=8546
-> Gather Merge (cost=78959.64..78961.27 rows=14
width=12) (actual time=444.063..444.526 rows=18 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=17636
-> Sort (cost=77959.61..77959.63 rows=7
width=12) (actual time=435.748..435.750 rows=6 loops=3)
Sort Key: j.jobstatuscode
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort
Memory: 25kB
Worker 1: Sort Method: quicksort
Memory: 25kB
Buffers: shared hit=17636
-> Partial HashAggregate
(cost=77959.44..77959.51 rows=7 width=12) (actual time=435.703..435.706
rows=6 loops=3)
Group Key: j.jobstatuscode
Buffers: shared hit=17620
-> Parallel Bitmap Heap Scan on
job j (cost=11528.22..76957.69 rows=200351 width=4) (actual
time=47.682..281.928 rows=163200
loops=3)
Recheck Cond:
(((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY
('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,T
HPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
Filter: ((jobstartdatetime
>= '2020-08-01 00:00:00'::timestamp without time zone) AND
(jobstartdatetime <= '2020-09-30 00
:00:00'::timestamp without time zone))
Heap Blocks: exact=6633
Buffers: shared hit=17620
-> Bitmap Index Scan on
job_list_test1 (cost=0.00..11408.01 rows=482693 width=0) (actual
time=49.825..49.826 rows=48960
0 loops=1)
Index Cond:
(((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY
('{THNPM1,THPRK1,THCNT1,THSPN1,THKR
I1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
Buffers: shared
hit=1913
-> GroupAggregate (cost=40.50..40.68 rows=6 width=12)
(actual time=0.093..0.094 rows=0 loops=1)
Group Key: j_1.jobstatuscode
Buffers: shared hit=12
-> Sort (cost=40.50..40.54 rows=16 width=4)
(actual time=0.092..0.092 rows=0 loops=1)
Sort Key: j_1.jobstatuscode
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=12
-> Index Scan using job_list_test3 on job
j_1 (cost=0.14..40.18 rows=16 width=4) (actual time=0.081..0.082 rows=0
loops=1)
Index Cond: (((countrycode)::text =
'TH'::text) AND ((facilitycode)::text = ANY
('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,T
HUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
Buffers: shared hit=12
-> CTE Scan on jobcount jc (cost=0.00..0.26 rows=13 width=24) (actual
time=444.215..444.221 rows=6 loops=1)
Buffers: shared hit=8558
-> Hash (cost=8.29..8.29 rows=10 width=4) (actual time=0.016..0.016
rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=2
-> Index Only Scan using jobstatus_jobstatuscode_unq on jobstatus
js (cost=0.14..8.29 rows=10 width=4) (actual time=0.006..0.010 rows=10
loops=1)
Heap Fetches: 0
Buffers: shared hit=2
Planning Time: 0.808 ms
Execution Time: 444.819 ms
(53 rows)

>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message aditya desai 2020-10-20 12:56:08 Re: CPU Consuming query. Sequential scan despite indexing.
Previous Message Mats Julian Olsen 2020-10-20 11:20:27 Re: Query Performance / Planner estimate off