Re: Inserting 26 million rows takes 8 hours, how to

From: Jose Vicente Nunez Z <josevnz(at)newbreak(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Inserting 26 million rows takes 8 hours, how to
Date: 2003-09-16 15:27:46
Message-ID: 1063726066.5407.46.camel@linux0037
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Tom,

On Tue, 2003-09-16 at 10:44, Tom Lane wrote:
> Jose Vicente Nunez Z <josevnz(at)newbreak(dot)com> writes:
> > For that i'm using the PostgreSQL copy tool, but the problem is that is
> > taking almost 9 hours to load the data:
>
> Hmm, 889 rows/second doesn't sound that bad, considering that you've
> given us no data about the hardware you're running on. In particular,
> what's your disk setup?

My fault, i should provide more information about my setup.

I got two ATA disks (one for PostgreSQL, the other for the data files).
The box is a Pentium III 1.3Ghz, dual CPU with 2 GB of RAM. One disk is
30 GB, the other is 40GB.

As you can see is not state of the art harware, but for development
purposes i should get a decent performance (only one or two applications
will be accessing the server to do some data mining).

>
> But at any rate, the only bottlenecks I could think of for a COPY
> command are:
>
> 1. Data conversion. This would depend on the column datatypes of the
> table, which you didn't tell us.
>
create table mytable
(
a varchar(20) not null,
b varchar(20) not null,
c varchar(20) not null,
d char(6),
f int null,
g float not null,
h float not null
)

Data conversions shouldn't be that complicated, isn't it?

> 2. Index updating. This would depend on the indexes on the table,
> which you didn't tell us.

I had one index and dropped it for the load. Also i turn of the 'fsync'
flag (fsync=false) and now the load time is only 15 minutes!. But then
recreating the original index takes forever:

create index myindex on mytable ( a, b, c );

For some reason i don't see much I/O but a lot of CPU ussage:

11:18:05 up 3 days, 20:55, 3 users, load average: 1.00, 1.00, 0.92
51 processes: 49 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 96.0% user 3.0% system 0.0% nice 0.0% iowait 0.0%
idle
CPU1 states: 0.1% user 0.0% system 0.0% nice 0.0% iowait 99.0%
idle
Mem: 2064404k av, 2049312k used, 15092k free, 0k shrd, 149544k
buff
406064k actv, 1076876k in_d, 345612k in_c
Swap: 2047744k av, 4252k used, 2043492k free 1809160k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU
COMMAND
1882 postgres 25 0 41492 40M 36080 R 98.6 2.0 87:34 0
postmaster
16998 root 15 0 0 0 0 SW 1.9 0.0 4:30 0
kjournald
1956 root 19 0 1152 1152 848 R 0.9 0.0 0:00 1 top
1 root 15 0 476 448 420 S 0.0 0.0 0:07 1 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0
migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1
migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1
keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0
ksoftirqd_CPU0
6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1
ksoftirqd_CPU1
11 root 19 0 0 0 0 SW 0.0 0.0 0:00 0
bdflush
7 root 15 0 0 0 0 SW 0.0 0.0 0:22 0 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 1
kscand/DMA
9 root 15 0 0 0 0 SW 0.0 0.0 1:57 0
kscand/Normal
10 root 15 0 0 0 0 SW 0.0 0.0 2:28 0
kscand/HighMem
12 root 15 0 0 0 0 SW 0.0 0.0 0:04 0
kupdated

Also seems to be than memory usage is ok, i don't see any process
blocked or waiting for resources, nor swapped out due lack of memory:

[root(at)linux0105 data]# vmstat 10 10
procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
1 0 0 4252 15092 149564 1809160 0 0 60 43 1 16
2 1 33
1 0 0 4252 15092 149564 1809160 0 0 0 1820 134 12
49 1 50
1 0 0 4252 15092 149564 1809160 0 0 0 1282 125 8
49 1 50
1 0 0 4252 15092 149568 1809160 0 0 0 1202 124 9
50 1 50
1 0 0 4252 15092 149572 1809160 0 0 0 1216 123 9
50 1 50
1 0 0 4252 15092 149576 1809160 0 0 0 1224 125 8
50 0 50
1 0 0 4252 15092 149580 1809160 0 0 0 1212 124 10
50 1 50
1 0 0 4252 15092 149584 1809160 0 0 0 1206 123 8
50 1 50
1 0 0 4252 15092 149588 1809160 0 0 0 1212 125 8
49 1 50
1 0 0 4252 15092 149592 1809160 0 0 0 1208 125 9
49 1 50

Any ideas about what i can do to speed up the Index creation? Why is
using so much CPU (doesn't seem to be writting much information to disk
though).

>
> 3. Foreign-key constraint checking. This would depend on the foreign
> keys the table has, which you didn't tell us.
>

None so far, is just one table.

> In short ... you'd better provide a lot more detail if you want
> useful commentary.
>
> > I apologize if this not the proper place to post this questions.
>
> pgsql-performance would be more appropriate, probably.

I'm reading some documentation (checking the list archives):

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Hopefully this would help me to fix the problem.

Thanks for the help,

JV.

>
> regards, tom lane
--
Jose Vicente Nunez Zuleta (josevnz at newbreak dot com)
Newbreak LLC System Administrator
http://www.newbreak.com
RHCE, SCJD, SCJP

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2003-09-16 15:40:51 Re: Inserting 26 million rows takes 8 hours, how to improve those times?
Previous Message Tom Lane 2003-09-16 14:44:36 Re: Inserting 26 million rows takes 8 hours, how to improve those times?