Re: INSERT extremely slow with large data sets

From: Slavisa Garic <Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au>
To: Dann Corbit <DCorbit(at)connx(dot)com>
Cc: Slavisa Garic <Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: INSERT extremely slow with large data sets
Date: 2003-11-15 01:11:40
Message-ID: Pine.GSO.4.10.10311151204540.11651-100000@bruce.csse.monash.edu.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Dann

Here is the schema and also could you just be more specific on COPY
command. ALso does talking dirrectly to API speed things up ? (I am new to
databases but i am learning quickly)

-- NimrodEnfJob --

create table NimrodEnfJob(
exp_id INTEGER not null references NimrodEnfExperiment,
task_id INTEGER not null references NimrodTask,
pgroup_id INTEGER not null references NimrodParameterGroup,
agent_id INTEGER references NimrodAgent on delete set null,
jobname varchar(255) not null,
admin char(1) not null default 'F'
check (admin in ('F', 'T')),
taskname varchar(255) not null,
param_text TEXT not null,
open char(1) not null default 'F'
check (open in ('F', 'T')),
control varchar(8) not null default 'start'
check (control in ('nostart', 'start', 'stop')),
status varchar(16) not null default 'ready'
check (status in ('ready', 'executing', 'failed',
'done')),
cpulength real not null,
sleeptime real not null,
filesize real not null,
cputime real,
waittime real,
filetime real,
filebytes integer,
priority integer not null default 100,
create_time timestamp not null default CURRENT_TIMESTAMP,
start_time timestamp,
finish_time timestamp,
budget real not null default 0.0,
servername varchar(255),
error_info varchar(255) not null default '',
more_info TEXT not null default '',
primary key (exp_id, jobname),
foreign key (exp_id, taskname) references NimrodEnfTask
);

Also these are the indexes on this table. I created them on the columnt
that are most commonly accessed:
create unique index nej_idx
ON NimrodEnfJob (exp_id, pgroup_id);

create unique index nej_idx1
ON NimrodEnfJob (pgroup_id);

create index nej_idx2
ON NimrodEnfJob (status);

create unique index nej_idx3
ON NimrodEnfJob (status, pgroup_id);

create index nej_idx4
ON NimrodEnfJob (status, agent_id);

create index nej_idx5
ON NimrodEnfJob (agent_id);

I did notice that removing those indexes doesn't import by much. Similar
behaviour is observed but it just takes a bit less time to insert (0.01
less then usually at 60000 records)

Regards,
Slavisa

On Fri, 14 Nov 2003, Dann Corbit
wrote:

> > -----Original Message-----
> > From: Slavisa Garic [mailto:Slavisa(dot)Garic(at)infotech(dot)monash(dot)edu(dot)au]
> > Sent: Thursday, November 13, 2003 11:37 PM
> > To: pgsql-hackers(at)postgresql(dot)org
> > Subject: [HACKERS] INSERT extremely slow with large data sets
> >
> >
> > Hi Everyone,
> >
> > This is my first post here so please tell me to go somewhere
> > else if this is the wrong place to post questions like this.
> >
> > I am using PostgreSQL 7.3.2 and have used earlier versions
> > (7.1.x onwards) and with all of them I noticed same problem
> > with INSERTs when there is a large data set. Just to so you
> > guys can compare time it takes to insert one row into a table
> > when there are only few rows present and when there are thousands:
> >
> > Rows Present Start Time Finish Time
> > ------------------------------------------------------------
> > 100 1068790804.12 1068790804.12
> > 1000 1068790807.87 1068790807.87
> > 5000 1068790839.26 1068790839.27
> > 10000 1068790909.24 1068790909.26
> > 20000 1068791172.82 1068791172.85
> > 30000 1068791664.06 1068791664.09
> > 40000 1068792369.94 1068792370.0
> > 50000 1068793317.53 1068793317.6
> > 60000 1068794369.38 1068794369.47
> >
> > As you can see if takes awfully lots of time for me just to
> > have those values inserted. Now to make a picture a bit
> > clearer for you this table has lots of information in there,
> > about 25 columns. Also there are few indexes that I created
> > so that the process of selecting values from there is faster
> > which by the way works fine. Selecting anything takes under 5 seconds.
> >
> > Any help would be greatly appreciated even pointing me in the
> > right direction where to ask this question. By the way I
> > designed the database this way as my application that uses
> > PGSQL a lot during the execution so there was a huge need for
> > fast SELECTs. Our experiments are getting larger and larger
> > every day so fast inserts would be good as well.
> >
> > Just to note those times above are of INSERTs only. Nothing
> > else done that would be included in those times. Machine was
> > also free and that was the only process running all the time
> > and the machine was Intel(R) Pentium(R) 4 CPU 2.40GHz.
>
> You should post the schema for the table in question when you ask a
> question like this.
>
> The behavior is not surprising in the least bit. Every database will
> perform in this way, since you have mentioned that you have indexes on
> the table.
>
> The depth of the tree will be proportional to the log of the row count.
> As the tree gets deeper, inserts will be more and more expensive.
>
> If you have a giant pile of stuff to insert, consider the COPY command
> or API if it is time critical.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2003-11-15 02:00:25 Re: oh dear ...
Previous Message Slavisa Garic 2003-11-15 01:03:22 Re: INSERT extremely slow with large data sets