Re: insert vs select into performance

From: PFC <lists(at)peufeu(dot)com>
To: "Thomas Finneid" <tfinneid(at)student(dot)matnat(dot)uio(dot)no>, pgsql-performance(at)postgresql(dot)org
Subject: Re: insert vs select into performance
Date: 2007-07-18 11:07:18
Message-ID: op.tvnqigq6cigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


>> It's the time to parse statements, plan, execute, roundtrips with
>> the client, context switches, time for your client library to escape
>> the data and encode it and for postgres to decode it, etc. In a word :
>> OVERHEAD.
>
> I know there is some overhead, but that much when running it batched...?

Well, yeah ;)

> Unfortunately its not fast enough, it needs to be done in no more than
> 1-2 seconds, ( and in production it will be maybe 20-50 columns of data,
> perhaps divided over 5-10 tables.)
> Additionally it needs to scale to perhaps three times as many columns
> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
> Further on it needs to allow for about 20 - 50 clients reading much of
> that data before the next batch of data arrives.

Wow. What is the application ?

Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty
slow) :

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 721,579 ms

OK, so you see, insert speed is pretty fast. With a better CPU and faster
disks, you can get a lot more.

test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»
pour la table «test»
ALTER TABLE
Temps : 100,577 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 1915,928 ms

This includes the time to update the index.

test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test=> CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..100000 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

Now you see, performing 100K individual inserts inside a plpgsql function
is also fast.
The postgres engine is pretty damn fast ; it's the communication overhead
that you feel, especially switching between client and server processes.

Another example :

=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms

VALUES is actually pretty fast. Here, there is no context switch,
everything is done in 1 INSERT.

However COPY is much faster because the parsing overhead and de-escaping
of data is faster. COPY is optimized for throughput.

So, advice :

For optimum throughput, have your application build chunks of data into
text files and use COPY. Or if your client lib supports the copy
interface, use it.
You will need a fast disk system with xlog and data on separate disks,
several CPU cores (1 insert thread will max out 1 core, use the others for
selects), lots of RAM so index updates don't need to seek, and tuning of
bgwriter and checkpoints to avoid load spikes.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2007-07-18 11:31:43 Re: insert vs select into performance
Previous Message Heikki Linnakangas 2007-07-18 09:03:00 Re: ionice to make vacuum friendier?