Slow create temp table

From: Clinton Adams <clinton(dot)adams(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow create temp table
Date: 2014-07-30 18:43:02
Message-ID: CAEuopLY3cptvK=xWCnEfzExXtV75d0YSYN3kwZi_xeYubgei7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

Any help regarding a sporadic and mysterious issue would be much appreciated.

A pgsql function for loading in data is occasionally taking 12+ hours
to complete versus its normal 1-2 hours, due to a slow down at the
CREATE TEMP TABLE step. During slow runs of the function, the temp
table data file is being written to at 8192 bytes/second. This rate
was consistent at the 5 hour mark up until I canceled the query at 6
hrs in. An immediate rerunning of the function finished in an hour.
Temp table file size was 226 MB and was created in ~15 mins.

PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit

Linux 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 21:14:45 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

ProLiant DL380p Gen8, 2 x E5-2620 (hyperthreading on)
96 GB
pgsql data dir mounted on 25 x ssd storage array, connected via fibre
channel, pg_xlog on a RAID 10 hdd array
deadline scheduler
8192 readahead

name | current_setting | source
------------------------------+---------------------------+--------------------
application_name | psql | client
archive_command | ********* | configuration file
archive_mode | on | configuration file
autovacuum | on | configuration file
autovacuum_max_workers | 6 | configuration file
bgwriter_delay | 40ms | configuration file
bgwriter_lru_maxpages | 1000 | configuration file
bgwriter_lru_multiplier | 3 | configuration file
checkpoint_completion_target | 0.9 | configuration file
checkpoint_segments | 1024 | configuration file
checkpoint_timeout | 30min | configuration file
client_encoding | UTF8 | client
cpu_operator_cost | 0.5 | configuration file
cpu_tuple_cost | 0.5 | configuration file
DateStyle | ISO, MDY | configuration file
deadlock_timeout | 10s | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 70GB | configuration file
effective_io_concurrency | 6 | configuration file
full_page_writes | on | configuration file
hot_standby | on | configuration file
hot_standby_feedback | on | configuration file
lc_messages | en_US.UTF-8 | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 1s | configuration file
log_checkpoints | on | configuration file
log_destination | csvlog | configuration file
log_file_mode | 0600 | configuration file
log_filename | postgresql-%a.log | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 250ms | configuration file
log_rotation_age | 1d | configuration file
log_rotation_size | 0 | configuration file
log_statement | ddl | configuration file
log_timezone | America/New_York | configuration file
log_truncate_on_rotation | on | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 2400MB | configuration file
max_connections | 1000 | configuration file
max_stack_depth | 5MB | configuration file
max_wal_senders | 5 | configuration file
port | 5432 | command line
random_page_cost | 4 | session
seq_page_cost | 1 | configuration file
shared_buffers | 8GB | configuration file
shared_preload_libraries | auto_explain | configuration file
stats_temp_directory | /var/lib/pgsql_stat_tmpfs | configuration file
TimeZone | America/New_York | configuration file
track_activities | on | configuration file
track_counts | on | configuration file
track_functions | all | configuration file
track_io_timing | on | configuration file
update_process_title | on | configuration file
wal_buffers | 64MB | configuration file
wal_keep_segments | 2000 | configuration file
wal_level | hot_standby | configuration file
work_mem | 32MB | configuration file

Number of connections at any one time on the database is 300-400, with
the majority idle - there are legacy reasons for that and the high
max_connections.

kernel.shmmax = 50701037568
kernel.shmall = 12378183
vm.swappiness = 0
vm.overcommit_memory = 2
vm.dirty_background_ratio = 2
vm.dirty_background_bytes = 0
vm.dirty_ratio = 5
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000

EXPLAIN ANALYZE
CREATE TEMPORARY TABLE temp AS
SELECT rxfill.*, betapat.betapatientid, patrx.rxnum,
patrx.patientrxid as patientrxid, betapat.pharmacypatientid AS
originalpharmacypatientid, store.storeid as betastoreid
FROM rxfilldata_parent rxfill
JOIN (select MAX(id) id, storeid, rxnbr FROM rxfilldata_parent
tmp WHERE clientid = 118 AND tmp.pkgfileid = 417995 GROUP BY storeid,
rxnbr) rxfillmax ON (rxfillmax.id = rxfill.id)
JOIN client.client client ON (rxfill.clientid = client.clientid)
JOIN client.chain chain ON (client.clientid = chain.clientid)
JOIN client.store store ON (chain.chainid = store.chainid AND
rxfill.storeid = store.clientstoreid)
LEFT OUTER JOIN patient.patientrx118 patrx ON (rxfill.clientid =
patrx.clientid AND rxfill.rxnbr = patrx.rxnum AND patrx.storeid =
store.storeid)
LEFT OUTER JOIN patient.betapatient118 betapat ON
(rxfill.clientid = betapat.clientid AND betapat.storeid =
store.storeid AND rxfill.pharmacypatientid =
betapat.pharmacypatientid)
where rxfill.clientid = 118 and rxfill.pkgfileid = 417995;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=520598.91..572134.79 rows=7 width=911)
(actual time=2862.795..90880.990 rows=273566 loops=1)
Join Filter: (betapat.storeid = store.storeid)
Rows Removed by Join Filter: 121214
-> Nested Loop Left Join (cost=520511.91..571468.59 rows=7
width=899) (actual time=2862.725..82299.527 rows=273566 loops=1)
-> Hash Join (cost=520398.91..570601.16 rows=7 width=887)
(actual time=2859.693..5998.178 rows=273566 loops=1)
Hash Cond: ((max(tmp.id)) = rxfill.id)
-> HashAggregate (cost=510782.11..525124.61
rows=28685 width=15) (actual time=728.322..1014.029 rows=273566
loops=1)
-> Bitmap Heap Scan on rxfilldata_parent tmp
(cost=100172.71..370349.11 rows=93622 width=15) (actual
time=44.358..148.145 rows=274808 loops=1)
Recheck Cond: ((clientid = 118) AND
(pkgfileid = 417995))
-> Bitmap Index Scan on ix_neededimport
(cost=0.00..95491.61 rows=93622 width=0) (actual time=38.742..38.742
rows=274808 loops=1)
Index Cond: ((clientid = 118) AND
(pkgfileid = 417995))
-> Hash (cost=9206.80..9206.80 rows=410 width=887)
(actual time=2130.473..2130.473 rows=274808 loops=1)
Buckets: 1024 Batches: 8 (originally 1) Memory
Usage: 32769kB
-> Nested Loop (cost=264.27..9206.80 rows=410
width=887) (actual time=1.239..1216.518 rows=274808 loops=1)
-> Nested Loop (cost=178.77..261.15
rows=13 width=13) (actual time=0.343..2.309 rows=966 loops=1)
-> Nested Loop (cost=110.00..124.51
rows=1 width=8) (actual time=0.093..0.098 rows=1 loops=1)
-> Index Only Scan using
pk_client on client (cost=55.00..60.00 rows=1 width=4) (actual
time=0.076..0.078 rows=1 loops=1)
Index Cond: (clientid = 118)
Heap Fetches: 0
-> Index Scan using ix_chain
on chain (cost=55.00..64.00 rows=1 width=8) (actual time=0.014..0.016
rows=1 loops=1)
Index Cond: (clientid = 118)
-> Bitmap Heap Scan on store
(cost=68.77..129.64 rows=14 width=13) (actual time=0.246..1.572
rows=966 loops=1)
Recheck Cond: (chainid = chain.chainid)
-> Bitmap Index Scan on
ix_store (cost=0.00..68.07 rows=14 width=0) (actual time=0.216..0.216
rows=966 loops=1)
Index Cond: (chainid =
chain.chainid)
-> Index Scan using ix_merge8_daily on
rxfilldata_parent rxfill (cost=85.50..670.63 rows=35 width=883)
(actual time=0.120..1.081 rows=284 loops=966)
Index Cond: (((storeid)::text =
(store.clientstoreid)::text) AND (clientid = 118))
Filter: (pkgfileid = 417995)
Rows Removed by Filter: 292
-> Index Scan using ux2_patientrx118 on patientrx118 patrx
(cost=113.00..123.42 rows=1 width=20) (actual time=0.275..0.277 rows=1
loops=273566)
Index Cond: (((rxfill.rxnbr)::text = (rxnum)::text) AND
(rxfill.clientid = clientid) AND (clientid = 118) AND (storeid =
store.storeid))
-> Index Scan using ix3_betapatient118 on betapatient118 betapat
(cost=87.00..94.17 rows=1 width=20) (actual time=0.025..0.029 rows=1
loops=273566)
Index Cond: ((rxfill.pharmacypatientid)::text =
(pharmacypatientid)::text)
Filter: ((clientid = 118) AND (rxfill.clientid = clientid))
Total runtime: 90945.841 ms

pg_bgwriter_snapshots:
now | checkpoints_timed | checkpoints_req |
checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint |
buffers_clean | maxwritten_clean | buffers_backend |
buffers_backend_fsync | buffers_alloc | stats_reset
-------------------------------+-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------------------------------
2014-07-29 10:00:01.65379-04 | 4 | 0
| 2108808 | 1586 | 151291 |
2207939 | 83 | 67605 |
0 | 28932559 | 2014-07-29 08:01:01.623033-04
2014-07-29 11:00:01.578856-04 | 2 | 0 |
409840 | 618 | 17033 |
734718 | 53 | 52256 |
0 | 30524848 | 2014-07-29 10:01:01.877218-04
2014-07-29 12:00:01.420009-04 | 4 | 0 |
1939095 | 1120 | 44134 |
1515493 | 124 | 114122 |
0 | 43833671 | 2014-07-29 10:01:01.877218-04
2014-07-29 13:00:01.234634-04 | 2 | 0 |
1364481 | 169 | 16329 |
427183 | 11 | 30784 |
0 | 21273967 | 2014-07-29 12:01:01.542161-04
2014-07-29 14:00:02.007022-04 | 4 | 0 |
2493810 | 607 | 43316 |
1233492 | 115 | 93203 |
0 | 42564936 | 2014-07-29 12:01:01.542161-04
2014-07-29 15:00:01.713446-04 | 2 | 0 |
854284 | 93 | 15033 |
215280 | 28 | 14880 |
0 | 25766378 | 2014-07-29 14:01:01.119926-04
2014-07-29 16:00:01.542989-04 | 4 | 0 |
2704023 | 322 | 28730 |
330864 | 28 | 22278 |
0 | 43323265 | 2014-07-29 14:01:01.119926-04
2014-07-29 17:00:01.39066-04 | 2 | 0 |
809083 | 139 | 8264 |
167504 | 0 | 15206 |
0 | 33998495 | 2014-07-29 16:01:01.671427-04
2014-07-29 18:00:01.375621-04 | 4 | 0 |
1767296 | 335 | 17826 |
252109 | 0 | 29399 |
0 | 58038221 | 2014-07-29 16:01:01.671427-04
2014-07-29 19:00:01.252766-04 | 2 | 0 |
851746 | 278 | 30729 |
1737370 | 149 | 36806 |
0 | 41487870 | 2014-07-29 18:01:01.474614-04
2014-07-29 20:00:02.07362-04 | 4 | 0 |
1026422 | 364 | 44182 |
2209503 | 219 | 68020 |
0 | 63828691 | 2014-07-29 18:01:01.474614-04

The sar output for the timeframe the problem occurred is available as
well, if that would be helpful, but does not show any issues of
unusual load.

Thanks for any help,
Clinton

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2014-07-30 18:53:07 Re: Why you should turn on Checksums with SSDs
Previous Message Tomas Vondra 2014-07-30 12:47:23 Re: 60 core performance with 9.3