From: | Cody Caughlan <toolbag(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Slow queries / commits, mis-configuration or hardware issues? |
Date: | 2011-11-14 18:16:46 |
Message-ID: | CAPVp=gYs7uYF_hy0MDBkpRCxNbWQHPSyP6sXO-6SJZ_TvaU=XQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi, running Postgres 9.1.1 on an EC2 m1.xlarge instance. Machine is a
dedicated master with 2 streaming replication nodes.
The machine has 16GB of RAM and 4 cores.
We're starting to see some slow queries, especially COMMITs that are
happening more frequently. The slow queries are against seemingly
well-indexed tables.
I have log_min_duration = 150ms
Slow commits like:
2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG: duration: 3062.784 ms
statement: COMMIT
2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG: duration: 2593.351 ms
statement: COMMIT
These slow COMMITs are against tables that received a large number of
UPDATEs and are growing fairly rapidly.
And slow queries like:
2011-11-14 17:50:20 UTC pid:6519 (16/0-0) LOG: duration: 1694.456 ms
statement: SELECT "facebook_wall_posts".* FROM "facebook_wall_posts"
WHERE "facebook_wall_posts"."token" =
'984c44e75975b224b38197cf8f8fc76a' LIMIT 1
query plan: http://explain.depesz.com/s/wbm
The # of rows in facebook_wall_posts is 5841
Structure of facebook_wall_posts:
Table "public.facebook_wall_posts"
Column | Type |
Modifiers
--------------------+-----------------------------+------------------------------------------------------------------
id | integer | not null default
nextval('facebook_wall_posts_id_seq'::regclass)
album_id | integer | not null
token | character varying(32) | not null
fb_recipient_id | character varying(64) | not null
post_id | character varying(100) | not null
fb_post_created_at | timestamp without time zone |
data | text |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
fb_post_deleted_at | timestamp without time zone |
Indexes:
"facebook_wall_posts_pkey" PRIMARY KEY, btree (id)
"index_facebook_wall_posts_on_token" UNIQUE, btree (token)
"index_facebook_wall_posts_on_album_id" btree (album_id)
And another slow query:
2011-11-14 17:52:44 UTC pid:14912 (58/0-0) LOG: duration: 979.740 ms
statement: SELECT "facebook_friends"."friend_id" FROM
"facebook_friends" WHERE "facebook_friends"."user_id" = 9134671
Query plan: http://explain.depesz.com/s/x1q
# of rows in facebook_friends is 27075088
Structure of facebook_friends:
Table "public.facebook_friends"
Column | Type |
Modifiers
------------+-----------------------------+---------------------------------------------------------------
id | integer | not null default
nextval('facebook_friends_id_seq'::regclass)
user_id | integer | not null
friend_id | integer | not null
created_at | timestamp without time zone |
Indexes:
"facebook_friends_pkey" PRIMARY KEY, btree (id)
"index_facebook_friends_on_user_id_and_friend_id" UNIQUE, btree
(user_id, friend_id)
We have auto-vacuum enabled and running. But yesterday I manually ran
vacuum on the database. Autovacuum settings:
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
other postgresql.conf settings:
shared_buffers = 3584MB
wal_buffers = 16MB
checkpoint_segments = 32
max_wal_senders = 10
checkpoint_completion_target = 0.9
wal_keep_segments = 1024
maintenance_work_mem = 256MB
work_mem = 88MB
shared_buffers = 3584MB
effective_cache_size = 10GB
The PGDATA dir is a RAID10 on 4 local ("ephemeral" in EC2 speak)
drives. I ran some dd tests and received the following output:
--- WRITING ---
root(at)sql03:/data# time sh -c "dd if=/dev/zero of=/data/tmp/bigfile
bs=8k count=4000000 && sync"
4000000+0 records in
4000000+0 records out
32768000000 bytes (33 GB) copied, 670.663 s, 48.9 MB/s
real 11m52.199s
user 0m2.720s
sys 0m45.330s
--- READING ---
root(at)sql03:/data# time dd of=/dev/zero if=/data/tmp/bigfile bs=8k
4000000+0 records in
4000000+0 records out
32768000000 bytes (33 GB) copied, 155.429 s, 211 MB/s
real 2m35.434s
user 0m2.400s
sys 0m33.160s
I have enabled log_checkpoints and here is a recent sample from the log:
2011-11-14 17:38:48 UTC pid:3965 (-0) LOG: checkpoint complete: wrote
15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8
recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync
files=60, longest=1.484 s, average=0.049 s
2011-11-14 17:39:15 UTC pid:3965 (-0) LOG: checkpoint starting: time
2011-11-14 17:43:49 UTC pid:3965 (-0) LOG: checkpoint complete: wrote
16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync
files=82, longest=2.943 s, average=0.050 s
2011-11-14 17:44:15 UTC pid:3965 (-0) LOG: checkpoint starting: time
2011-11-14 17:48:47 UTC pid:3965 (-0) LOG: checkpoint complete: wrote
14159 buffers (3.1%); 0 transaction log file(s) added, 0 removed, 6
recycled; write=269.818 s, sync=2.119 s, total=271.948 s; sync
files=71, longest=1.192 s, average=0.029 s
2011-11-14 17:49:15 UTC pid:3965 (-0) LOG: checkpoint starting: time
2011-11-14 17:53:47 UTC pid:3965 (-0) LOG: checkpoint complete: wrote
11337 buffers (2.5%); 0 transaction log file(s) added, 0 removed, 7
recycled; write=269.901 s, sync=2.508 s, total=272.419 s; sync
files=71, longest=1.867 s, average=0.035 s
2011-11-14 17:54:15 UTC pid:3965 (-0) LOG: checkpoint starting: time
2011-11-14 17:58:48 UTC pid:3965 (-0) LOG: checkpoint complete: wrote
15706 buffers (3.4%); 0 transaction log file(s) added, 0 removed, 7
recycled; write=270.104 s, sync=3.612 s, total=273.727 s; sync
files=67, longest=3.051 s, average=0.053 s
2011-11-14 17:59:15 UTC pid:3965 (-0) LOG: checkpoint starting: time
I've been collecting random samples from pg_stat_bgwriter:
https://gist.github.com/4faec2ca9a79ede281e1
So given all this information (if you need more just let me know), is
there something fundamentally wrong or mis-configured? Do I have an
I/O issue?
Thanks for any insight.
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2011-11-14 19:22:52 | Re: Slow queries / commits, mis-configuration or hardware issues? |
Previous Message | Kevin Grittner | 2011-11-14 17:50:25 | Re: unlogged tables |