From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Alon Goldshuv <agoldshuv(at)greenplum(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: COPY FROM performance improvements |
Date: | 2005-06-24 03:58:42 |
Message-ID: | 200506240358.j5O3wga20563@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Sounds great!
---------------------------------------------------------------------------
Alon Goldshuv wrote:
> This is a second iteration of a previous thread that didn't resolve few
> weeks ago. I made some more modifications to the code to make it compatible
> with the current COPY FROM code and it should be more agreeable this time.
>
> The main premise of the new code is that it improves the text data parsing
> speed by about 4-5x, resulting in total improvements that lie between 15% to
> 95% for data importing (higher range gains will occur on large data rows
> without many columns - implying more parsing and less converting to internal
> format). This is done by replacing a char-at-a-time parsing with buffered
> parsing and also using fast scan routines and minimum amount of
> loading/appending into line and attribute buf.
>
> The new code passes both COPY regression tests (copy, copy2) and doesn't
> break any of the others.
>
> It also supports encoding conversions (thanks Peter and Tatsuo and your
> feedback) and the 3 line-end types. Having said that, using COPY with
> different encodings was only minimally tested. We are looking into creating
> new tests and hopefully add them to postgres regression suite one day if
> it's desired by the community.
>
> This new code is improving the delimited data format parsing. BINARY and CSV
> will stay the same and will be executed separately for now (therefore there
> is some code duplication) In the future I plan to write improvements to the
> CSV path too, so that it will be executed without duplication of code.
>
> I am still missing supporting data that uses COPY_OLD_FE (question: what are
> the use cases? When will it be used? Please advise)
>
> I'll send out the patch soon. It's basically there to show that there is a
> way to load data faster. In future releases of the patch it will be more
> complete and elegant.
>
> I'll appreciate any comments/advices.
>
> Alon.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | ITAGAKI Takahiro | 2005-06-24 04:16:44 | Re: [PATCHES] O_DIRECT for WAL writes |
Previous Message | Rod Taylor | 2005-06-24 03:49:30 | Re: regression failure |