Insert performance slows down in large batch

From: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
To: pgsql-performance(at)postgresql(dot)org
Subject: Insert performance slows down in large batch
Date: 2005-12-01 17:49:11
Message-ID: 1133459351.23834.248779098@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am importing roughly 15 million rows in one batch transaction. I am
currently doing this through batch inserts of around 500 at a time,
although I am looking at ways to do this via multiple (one-per-table)
copy commands for performance reasons.

I am currently running: PostgreSQL 8.0.4, Redhat Enterprise Linux 4,
ext3, all-on-one partition. I am aware of methods of improving
performance by changing ext3 mounting options, splitting WAL, data, and
indexes to separate physical disks, etc. I have also adjusted my
shared_buffers, work_mem, maintenance_work_mem, and checkpoint_segments
and can post their values if anyone thinks it is relevant to my question
(See questions at the bottom)

What confuses me is that at the beginning of the import, I am inserting
roughly 25,000 rows every 7 seconds..and by the time I get towards the
end of the import, it is taking 145 seconds for the same number of rows.
The inserts are spread across 4 tables and I have dropped all indexes
and constraints on these tables, including foreign keys, unique keys,
and even primary keys (even though I think primary key doesn't improve
performance) The entire bulk import is done in a single transaction.

The result is a table with 4.8 million rows, two tables with 4.8*2
million rows, and another table with several thousand rows.

So, my questions are:
1) Why does the performance degrade as the table sizes grow? Shouldn't
the insert performance remain fairly constant if there are no indexes or
constraints?

2) Is there anything I can do to figure out where the time is being
spent? Will postgres log any statistics or information to help me
diagnose the problem? I have pasted a fairly representative sample of
vmstat below my e-mail in case it helps, although I'm not quite how to
interpret it in this case.

3) Any other advice, other than the things I listed above (I am aware of
using copy, ext3 tuning, multiple disks, tuning postgresql.conf
settings)?

Thanks in advance,
Jeremy Haile

#vmstat 2 20
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 9368 4416 2536 1778784 0 0 124 51 3 2 2
0 96 2
1 0 9368 4416 2536 1778784 0 0 0 0 1005 53 25
0 75 0
1 1 9368 3904 2544 1779320 0 0 12164 6 1103 262 24
1 59 16
1 0 9368 3704 2552 1779380 0 0 16256 24 1140 344 23
1 53 23
1 1 9368 2936 2560 1780120 0 0 16832 6 1143 359 23
1 52 24
1 1 9368 3328 2560 1779712 0 0 13120 0 1111 285 24
1 58 18
1 0 9368 4544 2560 1778556 0 0 5184 0 1046 141 25
0 67 8
1 1 9368 3776 2568 1779296 0 0 7296 6 1064 195 24
0 67 9
1 0 9368 4480 2568 1778548 0 0 4096 0 1036 133 24
0 69 6
1 0 9368 4480 2576 1778608 0 0 7504 0 1070 213 23
0 67 10
1 0 9368 3136 2576 1779900 0 0 9536 0 1084 235 23
0 66 10
1 1 9368 3072 2584 1779960 0 0 13632 6 1118 313 24
1 60 16
1 0 9368 4480 2592 1778592 0 0 8576 24 1075 204 24
0 63 12
1 0 9368 4480 2592 1778592 0 0 0 6 1004 52 25
0 75 0
1 0 9368 4544 2600 1778652 0 0 0 6 1005 55 25
0 75 0
1 1 9368 3840 2600 1779332 0 0 11264 4 1098 260 24
0 63 13
1 1 9368 3072 2592 1780156 0 0 17088 14 1145 346 24
1 51 24
1 1 9368 4096 2600 1779128 0 0 16768 6 1140 360 23
1 54 21
1 1 9368 3840 2600 1779332 0 0 16960 0 1142 343 24
1 54 22
1 0 9368 3436 2596 1779676 0 0 16960 0 1142 352 24
1 53 23

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Schumeyer 2005-12-01 17:58:12 COPY into table too slow with index
Previous Message Marc G. Fournier 2005-12-01 17:31:31 pg_stat* values ...