From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | "Slavisa Garic" <Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au> |
Cc: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT extremely slow with large data sets |
Date: | 2003-11-15 03:40:28 |
Message-ID: | D90A5A6C612A39408103E6ECDD77B829408C54@voyager.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Slavisa Garic [mailto:Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au]
> Sent: Friday, November 14, 2003 5:12 PM
> To: Dann Corbit
> Cc: Slavisa Garic; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] INSERT extremely slow with large data sets
>
>
> Hi Dann
>
> Here is the schema and also could you just be more specific
> on COPY command.
http://www.postgresql.org/docs/7.3/static/sql-copy.html
And
http://techdocs.postgresql.org/techdocs/usingcopy.php
May be helpful.
> ALso does talking dirrectly to API speed
> things up ? (I am new to databases but i am learning quickly)
Not particularly. This is the copy command API:
http://www.postgresql.org/docs/7.3/static/libpq-copy.html
What the API can allow you to do (for instance) would be to never let
the data touch the ground. Instead of writing to a text file or even a
binary format copy input file, you use the API to take the incoming data
and insert it directly.
Like everything else, there is a dark side. Read the documents and they
will explain it. But if you need to move a giant pile of data into the
database as fast as possible, it is the copy command that is the most
efficient.
> -- NimrodEnfJob --
>
> create table NimrodEnfJob(
> exp_id INTEGER not null references NimrodEnfExperiment,
> task_id INTEGER not null references NimrodTask,
> pgroup_id INTEGER not null references
> NimrodParameterGroup,
> agent_id INTEGER references NimrodAgent on
> delete set null,
> jobname varchar(255) not null,
> admin char(1) not null default 'F'
> check (admin in ('F', 'T')),
> taskname varchar(255) not null,
> param_text TEXT not null,
> open char(1) not null default 'F'
> check (open in ('F', 'T')),
> control varchar(8) not null default 'start'
> check (control in ('nostart', 'start', 'stop')),
> status varchar(16) not null default 'ready'
> check (status in ('ready', 'executing', 'failed',
> 'done')),
> cpulength real not null,
> sleeptime real not null,
> filesize real not null,
> cputime real,
> waittime real,
> filetime real,
> filebytes integer,
> priority integer not null default 100,
> create_time timestamp not null default CURRENT_TIMESTAMP,
> start_time timestamp,
> finish_time timestamp,
> budget real not null default 0.0,
> servername varchar(255),
> error_info varchar(255) not null default '',
> more_info TEXT not null default '',
> primary key (exp_id, jobname),
> foreign key (exp_id, taskname) references NimrodEnfTask
> );
>
> Also these are the indexes on this table. I created them on
> the columnt that are most commonly accessed:
> create unique index nej_idx
> ON NimrodEnfJob (exp_id, pgroup_id);
>
> create unique index nej_idx1
> ON NimrodEnfJob (pgroup_id);
>
> create index nej_idx2
> ON NimrodEnfJob (status);
>
> create unique index nej_idx3
> ON NimrodEnfJob (status, pgroup_id);
>
> create index nej_idx4
> ON NimrodEnfJob (status, agent_id);
>
> create index nej_idx5
> ON NimrodEnfJob (agent_id);
>
> I did notice that removing those indexes doesn't import by
> much. Similar behaviour is observed but it just takes a bit
> less time to insert (0.01 less then usually at 60000 records)
I am quite surprised that removing the indexes does not have a large
impact on insert speed, especially, since you have 6 of them. Most of
the other costs that I can think of are fixed for inserts into a "bare
table". Perhaps someone with more intimate knowledge of the inner
working may know why inserts into a table without any index will trail
off in speed as the table grows.
[snip]
From | Date | Subject | |
---|---|---|---|
Next Message | Marc G. Fournier | 2003-11-15 03:42:46 | Re: oh dear ... |
Previous Message | Christopher Kings-Lynne | 2003-11-15 03:38:18 | Re: ALTER TABLE modifications |