Re: VACUUM degrades performance significantly. Database becomes unusable!

From: "Stephen" <jleelim(at)xxxxxx(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: VACUUM degrades performance significantly. Database becomes unusable!
Date: 2003-10-16 17:40:02
Message-ID: YJAjb.10797$G26.657@nntp-post.primus.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Good news,

I partially fixed the problem on Linux 2.4. It appears the responsiveness
can be improved significantly by tuning the disk IO elevator in Linux using
"elvtune" in util-linux. The elevator in Linux is used to re-order
read/write requests to reduce disk seeks by ordering requests according to
disk sectors. Unfortunately, the elevator in kernel 2.4 is not very smart
(or flexible I should say depending on your needs) and can starve a
read/write request for a long time if not properly tuned.

See my older thread I wrote long time ago. Back then I didn't have too much
time to test different values especially for VACUUM:

http://groups.google.com/groups?q=linux+responsiveness+stephen+postgresql&hl
=en&lr=&ie=UTF-8&oe=UTF-8&selm=F92Znj0TrJIPT6nhdBf00021ae6%40hotmail.com&rnu
m=1

See also:

http://strasbourg.linuxfr.org/jl3/features-2.3-1.html

Below are the results using different elvtune values running repeatedly when
VACUUM'ing.

elvtune -r 2048 -w 8192 /dev/hdc (default Redhat 9):
====================================================

tsdb=# explain analyze select * from table1 where id =
'336139b47b7faf09fc4d4f03680a4ce5';
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=2290.07..2290.10 rows=1 loops=1)
Index Cond: (id = '336139b47b7faf09fc4d4f03680a4ce5'::character varying)
Total runtime: 2290.22 msec
(3 rows)

elvtune -r 128 -w 8192 /dev/hdc:
================================

tsdb=# explain analyze select * from table1 where id =
'008ab286d725d2ea0b3269c89fc01ce2';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=450.29..450.32 rows=1 loops=1)
Index Cond: (id = '008ab286d725d2ea0b3269c89fc01ce2'::character varying)
Total runtime: 450.46 msec
(3 rows)

elvtune -r 64 -w 8192 /dev/hdc:
===============================

tsdb=# explain analyze select * from table1 where id =
'0078997ac809877c1a0d1f76af753608';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=206.00..206.01 rows=1 loops=1)
Index Cond: (id = '0078997ac809877c1a0d1f76af753608'::character varying)
Total runtime: 206.14 msec
(3 rows)

elvtune -r 32 -w 8192 /dev/hdc:
===============================

tsdb=# explain analyze select * from table1 where id =
'00c45490c9f24858c17d7dfb98c5def5';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=210.61..210.62 rows=1 loops=1)
Index Cond: (id = '00c45490c9f24858c17d7dfb98c5def5'::character varying)
Total runtime: 210.75 msec
(3 rows)

elvtune -r 8 -w 8192 /dev/hdc:
==============================

tsdb=# explain analyze select * from table1 where id =
'006ea95ef6b9b8f0ddcb1f33c40190ec';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=338.04..338.05 rows=1 loops=1)
Index Cond: (id = '006ea95ef6b9b8f0ddcb1f33c40190ec'::character varying)
Total runtime: 338.18 msec
(3 rows)

elvtune -r 1 -w 8192 /dev/hdc:
==============================

tsdb=# explain analyze select * from table1 where id =
'0025a7a9182d5456474a72f773433c01';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=390.41..390.42 rows=1 loops=1)
Index Cond: (id = '0025a7a9182d5456474a72f773433c01'::character varying)
Total runtime: 390.55 msec
(3 rows)

elvtune -r 0 -w 8192 /dev/hdc:
==============================

tsdb=# explain analyze select * from table1 where id =
'0023783eda7e2a5f434e55a66c3a0a11';
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=246.29..246.32 rows=1 loops=1)
Index Cond: (id = '0023783eda7e2a5f434e55a66c3a0a11'::character varying)
Total runtime: 246.44 msec
(3 rows)

tsdb=# explain analyze select * from table1 where id =
'001eacb36161ac6a1f860bd391dce5c2';
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------
Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=346)
(actual time=14.11..14.12 rows=1 loops=1)
Index Cond: (id = '001eacb36161ac6a1f860bd391dce5c2'::character varying)
Total runtime: 14.25 msec
(3 rows)

Remarks:
========

In all cases, VACUUM completed in the same amount of time, process load
average is reduced slightly and vmstat IO is the same.

It turns out that as the -r read value in elvtune is made smaller, the more
responsive the read becomes up to a certain point when the disk needs to do
too many unoptimized seeks. A too small read value can cause the response
time to fluctuate more than a large value.

Understandably, the elvtune values are quite specific depending on what you
do. In my case, where the table is mostly read-only, "elevtune -r 64 -w 8192
/dev/hdc" works best. It should also depend on your expected load, purpose,
disk type, size and settings.

VACUUM is still disk IO intensive. Even with "elvtune -r 64 -w 8192
/dev/hdc" and VACUUM, the response time has gone down to 200 msec from 2200
msec (10 times factor), is still high compared to normal queries at 25 msec.
VACUUM needs to clamped down much more!

Regards,

Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2003-10-16 17:51:18 Re: Why I can't combine %TYPE with [] ?
Previous Message Tom Lane 2003-10-16 17:38:25 Re: problem with function to report how many records were changed