Quick Links

Re: INSERT extremely slow with large data sets

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Slavisa Garic" <Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: INSERT extremely slow with large data sets
Date:	2003-11-15 03:40:28
Message-ID:	D90A5A6C612A39408103E6ECDD77B829408C54@voyager.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> -----Original Message-----
> From: Slavisa Garic [mailto:Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au]
> Sent: Friday, November 14, 2003 5:12 PM
> To: Dann Corbit
> Cc: Slavisa Garic; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] INSERT extremely slow with large data sets
>
>
> Hi Dann
>
> Here is the schema and also could you just be more specific
> on COPY command.

http://www.postgresql.org/docs/7.3/static/sql-copy.html

And

http://techdocs.postgresql.org/techdocs/usingcopy.php

May be helpful.

> ALso does talking dirrectly to API speed
> things up ? (I am new to databases but i am learning quickly)

Not particularly. This is the copy command API:
http://www.postgresql.org/docs/7.3/static/libpq-copy.html

What the API can allow you to do (for instance) would be to never let
the data touch the ground. Instead of writing to a text file or even a
binary format copy input file, you use the API to take the incoming data
and insert it directly.

Like everything else, there is a dark side. Read the documents and they
will explain it. But if you need to move a giant pile of data into the
database as fast as possible, it is the copy command that is the most
efficient.

> -- NimrodEnfJob --
>
> create table NimrodEnfJob(
> exp_id INTEGER not null references NimrodEnfExperiment,
> task_id INTEGER not null references NimrodTask,
> pgroup_id INTEGER not null references
> NimrodParameterGroup,
> agent_id INTEGER references NimrodAgent on
> delete set null,
> jobname varchar(255) not null,
> admin char(1) not null default 'F'
> check (admin in ('F', 'T')),
> taskname varchar(255) not null,
> param_text TEXT not null,
> open char(1) not null default 'F'
> check (open in ('F', 'T')),
> control varchar(8) not null default 'start'
> check (control in ('nostart', 'start', 'stop')),
> status varchar(16) not null default 'ready'
> check (status in ('ready', 'executing', 'failed',
> 'done')),
> cpulength real not null,
> sleeptime real not null,
> filesize real not null,
> cputime real,
> waittime real,
> filetime real,
> filebytes integer,
> priority integer not null default 100,
> create_time timestamp not null default CURRENT_TIMESTAMP,
> start_time timestamp,
> finish_time timestamp,
> budget real not null default 0.0,
> servername varchar(255),
> error_info varchar(255) not null default '',
> more_info TEXT not null default '',
> primary key (exp_id, jobname),
> foreign key (exp_id, taskname) references NimrodEnfTask
> );
>
> Also these are the indexes on this table. I created them on
> the columnt that are most commonly accessed:
> create unique index nej_idx
> ON NimrodEnfJob (exp_id, pgroup_id);
>
> create unique index nej_idx1
> ON NimrodEnfJob (pgroup_id);
>
> create index nej_idx2
> ON NimrodEnfJob (status);
>
> create unique index nej_idx3
> ON NimrodEnfJob (status, pgroup_id);
>
> create index nej_idx4
> ON NimrodEnfJob (status, agent_id);
>
> create index nej_idx5
> ON NimrodEnfJob (agent_id);
>
> I did notice that removing those indexes doesn't import by
> much. Similar behaviour is observed but it just takes a bit
> less time to insert (0.01 less then usually at 60000 records)

I am quite surprised that removing the indexes does not have a large
impact on insert speed, especially, since you have 6 of them. Most of
the other costs that I can think of are fixed for inserts into a "bare
table". Perhaps someone with more intimate knowledge of the inner
working may know why inserts into a table without any index will trail
off in speed as the table grows.

[snip]

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Marc G. Fournier	2003-11-15 03:42:46	Re: oh dear ...
Previous Message	Christopher Kings-Lynne	2003-11-15 03:38:18	Re: ALTER TABLE modifications