From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | copy from performance on large tables with indexes |
Date: | 2007-06-07 09:17:40 |
Message-ID: | CA896D7906BF224F8A6D74A1B7E54AB301750AAB@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
Postgres: 8.2
os: Linux 4CPU, 4 GB RAM, Raid 1, 32 bit system
work_mem: 600 Mb
I have some tables which may become quite large (currently up to 6 Gb) .
I initially fill them using copy from (files) .
The import is fast enough as I only have a primary key on the table:
about 18 minutes
(over 300 Mb/minute)
Then I need 5 additional indexes on it. Creation time: 30 minutes
subsequently I compute some aggregations which need 4 hours and 30
minutes additional time
And now the problem:
If I get additional data for the table, the import become much more
slower due to the indexes (about 30 times slower !):
The performance degradation is probably due to the fact that all
indexs are too large to be kept in memory.
Moreover I guess that the indexes fill factors are too high (90%)
During this second import, I have about 20% iowait time.
The usual solution is to drop the indexes before the second import and
rebuild them afterwards, but I feel unconfident doing this as I don't
know how the system will react if some SELECT statements occures when
the index are missing. I can hardly avoid this.
So my idea for the second import process:
1) make a copy of the table:
create table B as select * from table A;
alter table B add constraint B_pk primary key (id);
2) import the new data in table B
copy B from file;
3) create the required indexes on B
create index Bix_1 on B..
create index Bix_2 on B..
create index Bix_2 on B..
create index Bix_2 on B..
4) replace table A with table B
alter table A renam to A_trash;
alter table B renam to A;
drop table A_trash;
(and rename the indexes to get the original state)
This seems to work but with side effects:
The only objects that refer to the tables are functions and indexes.
If a function is called within a same session before and after the table
renaming, the second attempt fails (or use the table A_trash if it still
exists). So I should close the session and start a new one before
further processing. Errors in other live sessions are acceptable, but
maybe you know a way to avoid them?)
And now a few questions :-)
- do you see any issue that prevent this workflow to work?
- is there any other side effect to take care of ?
- what is the maximum acceptable value for the parameter work_mem for my
configuration
(see the complete configuration below)
- has anybody built a similar workflow ?
- could this be a feature request to extend the capabilities of copy
from ?
Thanks for your time and attention,
Marc Mamin
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2007-06-07 09:23:19 | Re: Weird 8.2.4 performance |
Previous Message | Kristo Kaiv | 2007-06-07 07:09:25 | Re: VERY slow queries at random |