From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)sun(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com> |
Subject: | Re: Benchmark Data requested |
Date: | 2008-02-05 14:24:55 |
Message-ID: | 1202221496.4252.680.camel@ebony.site |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote:
> Hi,
>
> Le lundi 04 février 2008, Jignesh K. Shah a écrit :
> > Single stream loader of PostgreSQL takes hours to load data. (Single
> > stream load... wasting all the extra cores out there)
>
> I wanted to work on this at the pgloader level, so CVS version of pgloader is
> now able to load data in parallel, with a python thread per configured
> section (1 section = 1 data file = 1 table is often the case).
> Not configurable at the moment, but I plan on providing a "threads" knob which
> will default to 1, and could be -1 for "as many thread as sections".
That sounds great. I was just thinking of asking for that :-)
I'll look at COPY FROM internals to make this faster. I'm looking at
this now to refresh my memory; I already had some plans on the shelf.
> > Multiple table loads ( 1 per table) spawned via script is bit better
> > but hits wal problems.
>
> pgloader will too hit the WAL problem, but it still may have its benefits, or
> at least we will soon (you can already if you take it from CVS) be able to
> measure if the parallel loading at the client side is a good idea perf. wise.
Should be able to reduce lock contention, but not overall WAL volume.
> [...]
> > I have not even started Partitioning of tables yet since with the
> > current framework, you have to load the tables separately into each
> > tables which means for the TPC-H data you need "extra-logic" to take
> > that table data and split it into each partition child table. Not stuff
> > that many people want to do by hand.
>
> I'm planning to add ddl-partitioning support to pgloader:
> http://archives.postgresql.org/pgsql-hackers/2007-12/msg00460.php
>
> The basic idea is for pgloader to ask PostgreSQL about constraint_exclusion,
> pg_inherits and pg_constraint and if pgloader recognize both the CHECK
> expression and the datatypes involved, and if we can implement the CHECK in
> python without having to resort to querying PostgreSQL, then we can run a
> thread per partition, with as many COPY FROM running in parallel as there are
> partition involved (when threads = -1).
>
> I'm not sure this will be quicker than relying on PostgreSQL trigger or rules
> as used for partitioning currently, but ISTM Jignesh quoted § is just about
> that.
Much better than triggers and rules, but it will be hard to get it to
work.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew | 2008-02-05 14:29:12 | Re: Benchmark Data requested |
Previous Message | Dimitri Fontaine | 2008-02-05 14:06:48 | Re: Benchmark Data requested |