postgres performance

From: chidamparam muthusamy <mchidamparam(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: postgres performance
Date: 2013-12-06 17:36:58
Message-ID: CABUk_4jr9z3CU-5b8tMvgEhAsKJ8+JKRn+TivPGAis9FOZo1jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

hi,
Registered with PostgreSQL Help Forum to identify and resolve the Postgres
DB performance issue, received suggestions but could not improve the
speed/response time. Please help.

Details:
Postgres Version 9.3.1
Server configuration:
Processor: 2 x Intel Quad core E5620 @ 2.40GHz
RAM: 16 GB

Postgres configuration:
Effective cache size = 10 GB
shared Buffer = 1250 MB
random page cost = 4

Table size = 60 GB
Number of records = 44 million
Carried out Vacuum Analyze after inserting new records and also after
creating Index,
6 months data, every month around 10 GB will get added. Expecting good
performance with 3 years data.
DB Will be used for Reporting/Read, will not be used for transaction. Daily
records will be inserted through bulk insertion every day.

Table schema:
Table "public.detailed_report"
Column | Type | Modifiers
-------------------------------+----------------------------+-----------
group_id | character varying(50) | not null
client | character varying(50) |
gateway | character varying(50) |
call_id | character varying(120) | not null
parent_call_id | character varying(120) |
start_time | timestamp with time zone | not null
connect_time | timestamp with time zone |
end_time | timestamp with time zone |
duration | integer |
source | character varying(50) |
source_alias | character varying(50) |
dest_in_number | character varying(50) |
dest_out_number | character varying(50) |
bp_code_pay | character varying[] |
billed_duration_pay | integer[] |
rate_pay | character varying[] |
rate_effective_date_pay | timestamp with time zone[] |
type_value_pay | character varying[] |
slab_time_pay | character varying[] |
pin_pay | bigint[] |
amount_pay | double precision[] |
adjusted_pin_pay | bigint[] |
adjusted_amount_pay | double precision[] |
call_amount_pay | double precision |
country_code_pay | character varying[] |
country_desc_pay | character varying[] |
master_country_code | character varying(15) |
master_country_desc | character varying(100) |
bp_code_recv | character varying[] |
billed_duration_recv | integer[] |
rate_recv | character varying[] |
rate_effective_date_recv | timestamp with time zone[] |
type_value_recv | character varying[] |
slab_time_recv | character varying[] |
pin_recv | bigint[] |
amount_recv | double precision[] |
adjusted_pin_recv | bigint[] |
adjusted_amount_recv | double precision[] |
call_amount_recv | double precision |
country_code_recv | character varying[] |
country_desc_recv | character varying[] |
subscriber_type | character varying(50) |
pdd | smallint |
disconnect_reason | character varying(200) |
source_ip | character varying(20) |
dest_ip | character varying(20) |
caller_hop | character varying(20) |
callee_hop | character varying(20) |
caller_received_from_hop | character varying(20) |
callee_sent_to_hop | character varying(20) |
caller_media_ip_port | character varying(25) |
callee_media_ip_port | character varying(25) |
caller_original_media_ip_port | character varying(25) |
callee_original_media_ip_port | character varying(25) |
switch_ip | character varying(20) |
call_shop_amount_paid | boolean |
version | character varying |
call_duration_pay | integer |
call_duration_recv | integer |
audio_codec | character varying(5) |
video_codec | character varying(5) |
shadow_amount_recv | double precision |
shadow_amount_pay | double precision |
pulse_applied_recv | character varying(50) |
pulse_applied_pay | character varying(50) |

Index, multi column, 3 columns, matches exactly with query where condition
"endtime_groupid_client_tsidx_detail_report" btree (end_time DESC,
group_id, client), tablespace "indexspace" which exactly matches with
'where' condition,
" WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and
group_id='admin' and client ='CHOICE' GROUP by client, gateway;"
Index on a separate tablespace on another hard disk.

Query:
EXPLAIN (analyze, buffers) SELECT text(client) as client, text(gateway) as
gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0)
::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd,
sum(call_duration_recv)/1000.0 as duration_recv,
sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as
call_amount_recv, sum(call_amount_pay) as call_amount_
pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and
end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP
by client, gateway ORDER BY call_amount_recv DESC;
QUERY PLAN
------------------------------------------------------
Sort (cost=3422863.06..3422868.69 rows=2254 width=44) (actual
time=137852.474..137852.474 rows=5 loops=1)
Sort Key: (sum(call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared read=2491664
-> HashAggregate (cost=3422664.28..3422737.53 rows=2254 width=44) (actual
time=137852.402..137852.454 rows=5 loops=1)
Buffers: shared read=2491664
-> Bitmap Heap Scan on detailed_report (cost=644828.11..3399506.87
rows=1029218 width=44) (actual time=4499.558..125443.122 rows=5248227
loops=1)
Recheck Cond: ((end_time >= '2013-05-01 00:00:00+00'::timestamp with time
zone) AND (end_time < '2013-07-01 00:00:00+00'::timestamp with time zone)
AND ((group_id)::text = 'adm
in'::text) AND ((client)::text = 'CHOICE'::text))
Buffers: shared read=2491664
-> Bitmap Index Scan on endtime_groupid_client_tsidx_detail_report
(cost=0.00..644570.81 rows=1029218 width=0) (actual time=3418.754..3418.754
rows=5248227 loops=1)
Index Cond: ((end_time >= '2013-05-01 00:00:00+00'::timestamp with time
zone) AND (end_time < '2013-07-01 00:00:00+00'::timestamp with time zone)
AND ((group_id)::text =
'admin'::text) AND ((client)::text = 'CHOICE'::text))
Buffers: shared read=95055
Total runtime: *137868.946 ms*
(13 rows)

Checked by removing ORDER BY, but no improvement.

By increasing random_page_cost to 25, the query gets executed
sequentially,Seq Scan on detailed_report, time taken is comparatively
better than Indexed scan. But I am not preferring because when the data
grows the sequential scan performance will come down.

carried out 3 more set of tests:
1. Index on columns
2. multiple column index, with 2 columns
3. multiple column index, with three columns

Test Case 1:
************
indexes :
1)client
2)group_id
3)gateway
4)end_time

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"

testdb=# EXPLAIN (analyze,buffers,verbose)SELECT text(client) as client,
text(gateway) as gateway,count(*)::bigint as total_calls,
(avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2)
as pdd, sum(call_duration_recv)/1000.0 as duration_recv,
sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as
call_amount_recv, sum(call_amount_pay) as call_amount_pay FROM
detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01
00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway
ORDER BY call_amount_recv DESC;
QUERY PLAN

Sort (cost=3510106.93..3510112.25 rows=2127 width=44) (actual
time=148557.599..148557.599 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) /
1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)),
(((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)),
(sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=69 read=2505035
-> HashAggregate (cost=3509920.24..3509989.37 rows=2127 width=44) (actual
time=148557.556..148557.581 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) /
1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2),
((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv),
sum(call_amount_pay), client, gateway
Buffers: shared hit=69 read=2505035
-> Bitmap Heap Scan on public.detailed_report (cost=832774.93..3487872.62
rows=979894 width=44) (actual time=14257.148..135355.676 rows=5248227
loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time,
connect_time, end_time, duration, source, source_alias, dest_in_number,
dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay,
slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay,
call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv,
billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv,
slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv,
country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip,
dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port,
caller_original_media_ip_port, callee_original_media_ip_port, switch_ip,
call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv,
shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND
((detailed_report.group_id)::text = 'admin'::text) AND
(detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01
00:00:00+00'::timestamp with time zone))
Buffers: shared hit=69 read=2505035
-> BitmapAnd (cost=832774.93..832774.93 rows=979894 width=0) (actual
time=13007.643..13007.643 rows=0 loops=1)
Buffers: shared read=108495
-> Bitmap Index Scan on client_detailed_report_idx (cost=0.00..172876.66
rows=7862413 width=0) (actual time=2546.204..2546.204 rows=7840766 loops=1)
Index Cond: ((detailed_report.client)::text = 'CHOICE'::text)
Buffers: shared read=21427
-> Bitmap Index Scan on group_id_detailed_report_idx (cost=0.00..307105.20
rows=14971818 width=0) (actual time=4265.728..4265.728 rows=14945965
loops=1)
Index Cond: ((detailed_report.group_id)::text = 'admin'::text)
Buffers: shared read=40840
-> Bitmap Index Scan on end_time_detailed_report_idx (cost=0.00..352057.65
rows=16790108 width=0) (actual time=3489.106..3489.106 rows=16917795
loops=1)
Index Cond: ((detailed_report.end_time >= '2013-05-01
00:00:00+00'::timestamp with time zone) AND (detailed_report.end_time <
'2013-07-01 00:00:00+00'::timestamp wi
th time zone))
Buffers: shared read=46228
Total runtime:* 148558.070 ms*
(24 rows)

Test Case 2:
************
Indexes :
1)client
2)group_id
3)gateway
4)end_time
5)client,group_id

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"clientgroupid_detailed_report_idx" btree (client, group_id), tablespace
"indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"

testdb=# EXPLAIN (analyze,buffers,verbose)SELECT text(client) as client,
text(gateway) as gateway,count(*)::bigint as total_calls,
(avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2)
as pdd, sum(call_duration_recv)/1000.0 as duration_recv,
sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as
call_amount_recv, sum(call_amount_pay) as call_amount_pay FROM
detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01
00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway
ORDER BY call_amount_recv DESC;

QUERY PLAN
Sort (cost=3172381.37..3172387.11 rows=2297 width=44) (actual
time=132725.901..132725.901 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) /
1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)),
(((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)),
(sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared read=2472883
-> HashAggregate (cost=3172178.48..3172253.13 rows=2297 width=44) (actual
time=132725.861..132725.881 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) /
1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2),
((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv),
sum(call_amount_pay), client, gateway
Buffers: shared read=2472883
-> Bitmap Heap Scan on public.detailed_report (cost=434121.21..3149462.57
rows=1009596 width=44) (actual time=8257.581..120311.450 rows=5248227
loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time,
connect_time, end_time, duration, source, source_alias, dest_in_number,
dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay,
slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay,
call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv,
billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv,
slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv,
country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip,
dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port,
caller_original_media_ip_port, callee_original_media_ip_port, switch_ip,
call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv,
shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND
((detailed_report.group_id)::text = 'admin'::text) AND
(detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01
00:00:00+00'::timestamp with time zone))
Buffers: shared read=2472883
-> BitmapAnd (cost=434121.21..434121.21 rows=1009596 width=0) (actual
time=7101.419..7101.419 rows=0 loops=1)
Buffers: shared read=76274
-> Bitmap Index Scan on clientgroupid_detailed_report_idx
(cost=0.00..74766.52 rows=2649396 width=0) (actual time=3066.346..3066.346
rows=7840766 loops=1)
Index Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND
((detailed_report.group_id)::text = 'admin'::text))
Buffers: shared read=30046
-> Bitmap Index Scan on end_time_detailed_report_idx (cost=0.00..358849.64
rows=17114107 width=0) (actual time=2969.577..2969.577 rows=16917795
loops=1)
Index Cond: ((detailed_report.end_time >= '2013-05-01
00:00:00+00'::timestamp with time zone) AND (detailed_report.end_time <
'2013-07-01 00:00:00+00'::timestamp wi
th time zone))
Buffers: shared read=46228
Total runtime:* 132726.073 ms*
(21 rows)

Test Case 3:
************
Indexes:
Index :
1)client
2)group_id
3)gateway
4)end_time
5)client,group_id
6)client,group_id,end_time

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"clientgroupid_detailed_report_idx" btree (client, group_id), tablespace
"indexspace"
"clientgroupidendtime_detailed_report_idx" btree (client, group_id,
end_time), tablespace "indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"

testdb=# EXPLAIN (analyze, verbose) SELECT text(client) as client,
text(gateway) as gateway,count(*)::bigint as total_calls,
(avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2)
as pdd, sum(call_duration_recv)/1000.0 as duration_recv,
sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as
call_amount_recv, sum(call_amount_pay) as call_amount_
pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and
end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP
by client, gateway ORDER BY call_amount_recv DESC;

QUERY PLAN

Sort (cost=2725603.99..2725609.46 rows=2188 width=44) (actual
time=137713.264..137713.265 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) /
1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)),
(((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)),
(sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=2725411.50..2725482.61 rows=2188 width=44) (actual
time=137713.192..137713.215 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) /
1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2),
((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv),
sum(call_amount_pay), client, gateway
-> Bitmap Heap Scan on public.detailed_report (cost=37356.61..2703244.88
rows=985183 width=44) (actual time=3925.850..124647.660 rows=5248227
loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time,
connect_time, end_time, duration, source, source_alias, dest_in_number,
dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay,
slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay,
call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv,
billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv,
slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv,
country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip,
dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port,
caller_original_media_ip_port, callee_original_media_ip_port, switch_ip,
call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv,
shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND
((detailed_report.group_id)::text = 'admin'::text) AND
(detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01
00:00:00+00'::timestamp with time zone))
-> Bitmap Index Scan on clientgroupidendtime_detailed_report_idx
(cost=0.00..37110.31 rows=985183 width=0) (actual time=2820.150..2820.150
rows=5248227 loops=1)
Index Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND
((detailed_report.group_id)::text = 'admin'::text) AND
(detailed_report.end_time >= '2013-05-01 00:00:0
0+00'::timestamp with time zone) AND (detailed_report.end_time <
'2013-07-01 00:00:00+00'::timestamp with time zone))
Total runtime: *137728.029 ms*
(12 rows)

Tried by creating partial Index on group_id column for the value 'admin'
and also end_time column for one month range.

With all the above experiment, could not reduce the response time, please
suggest.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Hodgson 2013-12-06 18:16:03 Re: postgres performance
Previous Message Janek Sendrowski 2013-12-06 16:21:13 Re: [PERFORM] Similarity search with the tsearch2 extension