Re: Optimization required for multiple insertions in PostgreSQL

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "siva palanisamy" <psivait(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimization required for multiple insertions in PostgreSQL
Date: 2011-11-03 18:32:48
Message-ID: 4EB29800020000250004299D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

[Please keep the list copied.]

siva palanisamy <psivait(at)gmail(dot)com> wrote:

> Could you pls guide me on how to minimize time consumption? I've
> postgresql 8.1.4; Linux OS.

Well, the first thing to do is to use a supported version of
PostgreSQL. More recent releases perform better, for starters.

http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy

Whichever major release you use, you should be up-to-date on bug
fixes, some of which are fixes for bugs which cause performance
problems:

http://www.postgresql.org/support/versioning

> I'm yet to check its RAM and other memory capacity but I
> guess it would've the necessary stuffs.

Knowing what hardware you have, and what your current PostgreSQL
configuration setting are, would allow us to suggest what you might
reconfigure to tune your database.

> My master table's schema is
>
> CREATE TABLE contacts ( contact_id SERIAL PRIMARY KEY,
> contact_type INTEGER DEFAULT 0, display_name TEXT NOT NULL DEFAULT
> '', first_name TEXT DEFAULT '', last_name TEXT DEFAULT '',
> company_name TEXT DEFAULT '', last_updated TIMESTAMP NOT NULL
> DEFAULT current_timestamp, UNIQUE(display_name) ) WITHOUT OIDS;

Not that this is a performance issue, but you almost certainly will
expect the semantics provided by TIMESTAMP WITH TIME ZONE for your
last_updated column. Just specifying TIMESTAMP is probably going to
give you an unpleasant surprise somewhere down the road.

> I've a sql function that is called from a C program where
> parameters are being passed. It is replicated for the other 2
> tables as well. Totally, I've 3 tables.

Which table is the source of your slowness, and how do you know
that?

> FYI, database connection is opened for the 1st and closed
> only after the last record is attempted. Do you think these
> constraints take a lot of time?

The only constraints you've shown are PRIMARY KEY and UNIQUE. It is
somewhat slower to add rows with those constraints in place than to
blast in data without the constraints and then add the constraints;
but I understand that if the data is not known to be clean and free
of duplicates, that's not possible. That certainly doesn't account
for the timings you describe.

> taking days to complete 20000 odd records are not encouraging!

I've seen PostgreSQL insert more rows than that per second, so it's
not like it is some inherent slowness of PostgreSQL. There is
something you're doing with it that is that slow. Getting onto a
modern version of PostgreSQL may help a lot, but most likely there's
something you're not telling us yet that is the thing that really
needs to change.

Just as one off-the-wall example of what *can* happen -- if someone
disabled autovacuum and had a function which did an update to all
rows in a table each time the function was called, they would see
performance like you describe. How do I know, from what you've told
me, that you're *not* doing that? Or one of a hundred other things
I could postulate? (Hint, if you showed us your current PostgreSQL
settings I could probably have ruled this out.)

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-11-03 18:41:16 Re: Predicates not getting pushed into SQL function?
Previous Message Claudio Freire 2011-11-03 17:55:52 Re: Blocking excessively in FOR UPDATE