From: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com> |
Subject: | Re: pg_dump additional options for performance |
Date: | 2008-02-27 10:19:28 |
Message-ID: | 200802271119.28655.dfontaine@hi-media.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Le mardi 26 février 2008, Joshua D. Drake a écrit :
> > Think 100GB+ of data that's in a CSV or delimited file. Right now
> > the best import path is with COPY, but it won't execute very fast as
> > a single process. Splitting the file manually will take a long time
> > (time that could be spend loading instead) and substantially increase
> > disk usage, so the ideal approach would figure out how to load in
> > parallel across all available CPUs against that single file.
>
> You mean load from position? That would be very, very cool.
Did I mention pgloader now does exactly this when configured like this:
http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
section_threads = N
split_file_reading = True
IIRC, Simon and Greg Smith asked for pgloader to get those parallel loading
features in order to have some first results and ideas about the performance
gain, as a first step in the parallel COPY backend implementation design.
Hope this helps,
--
dim
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-02-27 10:47:29 | Re: An idea for parallelizing COPY within one backend |
Previous Message | Magnus Hagander | 2008-02-27 10:05:57 | Re: win32 build problem (cvs, msvc 2005 express) |