From: | Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Strange serialization problem |
Date: | 2005-04-13 22:16:54 |
Message-ID: | 1113430614.425d9a560e5b5@webmail.telus.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice pgsql-performance |
I have a performance problem; I'd like any suggestions on where to continue
investigation.
A set of insert-only processes seems to serialize itself. :-(
The processes appear to be blocked on disk IO, and probably the table drive,
rather than the pg_xlog drive.
Each process is inserting a block of 10K rows into a table.
I'm guessing they are "serialized" because one process by itself takes 15-20
secs; running ten processes in parallel averages 100-150 secs (each), with
elapsed (wall) time of 150-200 secs.
Polling pg_locks shows each process has (been granted) only the locks you would
expect. I RARELY see an Exclusive lock on an index, and then only on one index
at a time.
A sample from pg_locks:
TABLE/INDEX GRANTED PID MODE
m_reason t 7340 AccessShare
message t 7340 AccessShare
message t 7340 RowExclusive
pk_message t 7340 AccessShare
tmp_message t 7340 AccessShare
("m_reason" is a one-row lookup table; see INSERT cmd below).
--------------------------
The query plan is quite reasonable (see below).
On a side note, this is the first app I've had to deal with that is sweet to
pg_xlog, but hammers the drive bearing the base table (3x the traffic).
"log_executor_stats" for a sample insert look reasonable (except the "elapsed"!)
! system usage stats:
! 308.591728 elapsed 3.480000 user 1.270000 system sec
! [4.000000 user 1.390000 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 18212/15 [19002/418] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [0/0] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 9675 read, 8781 written, buffer hit rate = 97.66%
! Local blocks: 504 read, 64 written, buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
Summarized "ps" output for the above backend process, sampled every 5 secs,
shows it is 94% in the 'D' state, 3% in the 'S' state.
================
== BACKGROUND ==
================
**SOFTWARE
- PG 7.4.6, RedHat 8.
----------------------------------
**HARDWARE
Xeon 2x2 2.4GHz 2GB RAM
4 x 73GB SCSI; pg_xlog and base on separate drives.
----------------------------------
**APPLICATION
Six machines post batches of 10K messages to the PG db server.
Machine #nn generates its ID keys as "nn00000000001"::bigint etc.
Each process runs:
- "COPY tmp_message FROM STDIN" loads its own one-use TEMP table.
- " INSERT INTO message
SELECT tmp.* FROM tmp_message AS tmp
JOIN m_reason ON m_reason.name = tmp.reason
LEFT JOIN message USING (ID) WHERE message.ID is null
(check required because crash recovery logic requires idempotent insert)
"DROP TABLE tmp_message" --- call me paranoid, this is 7.4
The COPY step time is almost constant when #processes varies from 1 to 10.
----------------------------------
**POSTGRES
pg_autovacuum is running with default parameters.
Non-default GUC values:
checkpoint_segments = 512
default_statistics_target = 200
effective_cache_size = 500000
log_min_duration_statement = 1000
max_fsm_pages = 1000000
max_fsm_relations = 1000
random_page_cost = 1
shared_buffers = 10000
sort_mem = 16384
stats_block_level = true
stats_command_string = true
stats_row_level = true
vacuum_mem = 65536
wal_buffers = 2000
Wal_buffers and checkpoint_segments look outrageous,
but were tuned for another process, that posts batches of 10000 6KB rows
in a single insert.
----------------------------------
TABLE/INDEX STATISTICS
----------------------------------
MACHINE STATISTICS
ps gives the backend process as >98% in (D) state, with <1% CPU.
A "top" snapshot:
CPU states: cpu user nice system irq softirq iowait idle
total 2.0% 0.0% 0.8% 0.0% 0.0% 96.9% 0.0%
cpu00 2.5% 0.0% 1.9% 0.0% 0.0% 95.4% 0.0%
cpu01 1.7% 0.0% 0.1% 0.0% 0.3% 97.6% 0.0%
cpu02 0.5% 0.0% 0.7% 0.0% 0.0% 98.6% 0.0%
cpu03 3.1% 0.0% 0.5% 0.0% 0.0% 96.2% 0.0%
Mem: 2061552k av, 2041752k used, 19800k free, 0k shrd, 21020k buff
iostat reports that the $PGDATA/base drive is being worked but not overworked.
The pg_xlog drive is underworked:
KBPS TPS KBPS TPS KBPS TPS KBPS TPS
12:30 1 2 763 16 31 8 3336 269
12:40 5 3 1151 22 5 5 2705 320
^pg_xlog^ ^base^
The base drive has run as much as 10MBPS, 5K TPS.
----------------------------------
EXPLAIN ANALYZE output:
The plan is eminently reasonable. But there's no visible relationship
between the top level "actual time" and the "total runtime":
Nested Loop Left Join
(cost=0.00..31109.64 rows=9980 width=351)
(actual time=0.289..2357.346 rows=9980 loops=1)
Filter: ("inner".id IS NULL)
-> Nested Loop
(cost=0.00..735.56 rows=9980 width=351)
(actual time=0.092..1917.677 rows=9980 loops=1)
Join Filter: (("outer".name)::text = ("inner".reason)::text)
-> Seq Scan on m_reason r
(cost=0.00..1.01 rows=1 width=12)
(actual time=0.008..0.050 rows=1 loops=1)
-> Seq Scan on tmp_message t
(cost=0.00..609.80 rows=9980 width=355)
(actual time=0.067..1756.617 rows=9980 loops=1)
-> Index Scan using pk_message on message
(cost=0.00..3.02 rows=1 width=8)
(actual time=0.014..0.014 rows=0 loops=9980)
Index Cond: ("outer".id = message.id)
Total runtime: 737401.687 ms
--
"Dreams come true, not free." -- S.Sondheim, ITW
From | Date | Subject | |
---|---|---|---|
Next Message | Slavisa Garic | 2005-04-14 01:11:51 | Re: [NOVICE] Many connections lingering |
Previous Message | Tim Pushor | 2005-04-13 22:03:25 | Re: ISAM to SQL |
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2005-04-13 22:27:20 | Re: Compressing WAL |
Previous Message | Tom Mack | 2005-04-13 21:06:57 | Problem with slow query (caused by improper nestloop?) |