From: | "H(dot) Hall" <hhall1001(at)reedyriver(dot)com> |
---|---|
To: | |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: update 600000 rows |
Date: | 2007-12-16 11:42:09 |
Message-ID: | 47650F11.9000108@reedyriver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
okparanoid(at)free(dot)fr wrote:
> Hello
>
> i have a python script to update 600000 rows to one table from a csv file in my
> postgres database and it takes me 5 hours to do the transaction...
>
> I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron.
>
> I have desactived all index except the primary key who is not updated since it's
> the reference column of the update too.
>
> When i run this script the server is not used by any other user.
>
> First when i run htop i see that the memory used is never more than 150 MB.
> I don't understand in this case why setting shmall and shmmax kernel's
> parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
> the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
>
> The script is run with only one transaction and pause by moment to let the time
> to postgres to write data to disk.
>
> If the data were writed at the end of the transaction will be the perfomance
> better ? i wan't that in production data regulary writed to disk to prevent
> loosinf of data but it there any interest to write temporary data in disk in a
> middle of a transaction ???
>
> I'm completely noob to postgres and database configuration and help are
> welcome.
>
> thank
You will get a huge improvement in time if you use batch updates instead
of updating a row at a time. See:
http://www.postgresql.org/docs/8.2/interactive/populate.html
and
http://www.postgresql.org/docs/8.2/interactive/sql-begin.html
You will also get a big improvement if you can turn fsync off during the
update. See:
http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html
You also need to vacuum the table after doing that many updates since pg
does a delete and insert on each update, there will be a lot of holes.
Cheers
HH
From | Date | Subject | |
---|---|---|---|
Next Message | James Mansion | 2007-12-16 17:48:06 | Re: libgcc double-free, backend won't die |
Previous Message | Bruce Momjian | 2007-12-16 06:39:48 | Re: RAID arrays and performance |