From: | PFC <lists(at)peufeu(dot)com> |
---|---|
To: | "Thomas Finneid" <tfinneid(at)student(dot)matnat(dot)uio(dot)no>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: insert vs select into performance |
Date: | 2007-07-18 11:07:18 |
Message-ID: | op.tvnqigq6cigqcu@apollo13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>> It's the time to parse statements, plan, execute, roundtrips with
>> the client, context switches, time for your client library to escape
>> the data and encode it and for postgres to decode it, etc. In a word :
>> OVERHEAD.
>
> I know there is some overhead, but that much when running it batched...?
Well, yeah ;)
> Unfortunately its not fast enough, it needs to be done in no more than
> 1-2 seconds, ( and in production it will be maybe 20-50 columns of data,
> perhaps divided over 5-10 tables.)
> Additionally it needs to scale to perhaps three times as many columns
> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
> Further on it needs to allow for about 20 - 50 clients reading much of
> that data before the next batch of data arrives.
Wow. What is the application ?
Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty
slow) :
test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms
test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 721,579 ms
OK, so you see, insert speed is pretty fast. With a better CPU and faster
disks, you can get a lot more.
test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms
test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»
pour la table «test»
ALTER TABLE
Temps : 100,577 ms
test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 1915,928 ms
This includes the time to update the index.
test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms
test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms
test=> CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..100000 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms
test=> SELECT test_insert();
test_insert
-------------
(1 ligne)
Temps : 1885,382 ms
Now you see, performing 100K individual inserts inside a plpgsql function
is also fast.
The postgres engine is pretty damn fast ; it's the communication overhead
that you feel, especially switching between client and server processes.
Another example :
=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms
VALUES is actually pretty fast. Here, there is no context switch,
everything is done in 1 INSERT.
However COPY is much faster because the parsing overhead and de-escaping
of data is faster. COPY is optimized for throughput.
So, advice :
For optimum throughput, have your application build chunks of data into
text files and use COPY. Or if your client lib supports the copy
interface, use it.
You will need a fast disk system with xlog and data on separate disks,
several CPU cores (1 insert thread will max out 1 core, use the others for
selects), lots of RAM so index updates don't need to seek, and tuning of
bgwriter and checkpoints to avoid load spikes.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Stone | 2007-07-18 11:31:43 | Re: insert vs select into performance |
Previous Message | Heikki Linnakangas | 2007-07-18 09:03:00 | Re: ionice to make vacuum friendier? |