Quick Links

Re: insert vs select into performance

From:	PFC <lists(at)peufeu(dot)com>
To:	"Thomas Finneid" <tfinneid(at)student(dot)matnat(dot)uio(dot)no>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: insert vs select into performance
Date:	2007-07-18 11:07:18
Message-ID:	op.tvnqigq6cigqcu@apollo13
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

>> It's the time to parse statements, plan, execute, roundtrips with
>> the client, context switches, time for your client library to escape
>> the data and encode it and for postgres to decode it, etc. In a word :
>> OVERHEAD.
>
> I know there is some overhead, but that much when running it batched...?

Well, yeah ;)

> Unfortunately its not fast enough, it needs to be done in no more than
> 1-2 seconds, ( and in production it will be maybe 20-50 columns of data,
> perhaps divided over 5-10 tables.)
> Additionally it needs to scale to perhaps three times as many columns
> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
> Further on it needs to allow for about 20 - 50 clients reading much of
> that data before the next batch of data arrives.

Wow. What is the application ?

Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty
slow) :

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 721,579 ms

OK, so you see, insert speed is pretty fast. With a better CPU and faster
disks, you can get a lot more.

test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»
pour la table «test»
ALTER TABLE
Temps : 100,577 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 1915,928 ms

This includes the time to update the index.

test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test=> CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..100000 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

Now you see, performing 100K individual inserts inside a plpgsql function
is also fast.
The postgres engine is pretty damn fast ; it's the communication overhead
that you feel, especially switching between client and server processes.

Another example :

=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms

VALUES is actually pretty fast. Here, there is no context switch,
everything is done in 1 INSERT.

However COPY is much faster because the parsing overhead and de-escaping
of data is faster. COPY is optimized for throughput.

So, advice :

For optimum throughput, have your application build chunks of data into
text files and use COPY. Or if your client lib supports the copy
interface, use it.
You will need a fast disk system with xlog and data on separate disks,
several CPU cores (1 insert thread will max out 1 core, use the others for
selects), lots of RAM so index updates don't need to seek, and tuning of
bgwriter and checkpoints to avoid load spikes.

In response to

Re: insert vs select into performance at 2007-07-17 21:27:07 from Thomas Finneid

Responses

Re: insert vs select into performance at 2007-07-18 19:08:08 from Thomas Finneid

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Michael Stone	2007-07-18 11:31:43	Re: insert vs select into performance
Previous Message	Heikki Linnakangas	2007-07-18 09:03:00	Re: ionice to make vacuum friendier?