Re: copy vs. C function

From: Sam Gendler <sgendler(at)ideasculptor(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: copy vs. C function
Date: 2011-12-11 02:35:09
Message-ID: 71697165-78B8-4095-9185-B900540BAC0B@ideasculptor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Start a transaction before the first insert and commit it after the last one and it will be much better, but I believe that the copy code path is optimized to perform better than any set of queries can, even in a single transaction

Sent from my iPhone

On Dec 10, 2011, at 5:27 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:

> I was experimenting with a few different methods of taking a line of
> text, parsing it, into a set of fields, and then getting that info
> into a table.
>
> The first method involved writing a C program to parse a file, parse
> the lines and output newly-formatted lines in a format that
> postgresql's COPY function can use.
> End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
> output new data to new file -- 4 seconds, COPY new file -- 10
> seconds).
>
> The next approach I took was to write a C function in postgresql to
> parse a single TEXT datum into an array of C strings, and then use
> BuildTupleFromCStrings. There are 8 columns involved.
> Eliding the time it takes to COPY the (raw) file into a temporary
> table, this method took 120 seconds, give or take.
>
> The difference was /quite/ a surprise to me. What is the probability
> that I am doing something very, very wrong?
>
> NOTE: the code that does the parsing is actually the same,
> line-for-line, the only difference is whether the routine is called by
> a postgresql function or by a C program via main, so obviously the
> overhead is elsewhere.
> NOTE #2: We are talking about approximately 2.6 million lines.
>
> I was testing:
>
> \copy some_table from 'some_file.csv' with csv
> vs.
> insert into some_table select (some_func(line)).* from some_temp_table;
>
> where some_func had been defined with (one) IN TEXT and (8) OUT params
> of varying types.
>
> PostgreSQL 9.1.1 on Linux, x86_64
>
> --
> Jon
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Nelson 2011-12-11 03:08:39 Re: copy vs. C function
Previous Message Craig Ringer 2011-12-11 02:32:50 Re: copy vs. C function