From: | Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: reducing IO and memory usage: sending the content of a table to multiple files |
Date: | 2009-04-03 09:09:56 |
Message-ID: | 20090403110956.477596a8@dawn.webthatworks.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, 3 Apr 2009 02:05:19 +0100
Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Thu, Apr 02, 2009 at 09:48:33PM +0200, Ivan Sergio Borgonovo
> wrote:
> > I didn't find any elegant example of cursor use in PHP... OK PHP
> > is not the most elegant language around... but still any good
> > exapmle someone could point me at?
> You can obviously increase the "FETCH" upwards and if you're
> feeling fancy you could even run the FETCH async from the code
> that processes the results. Maybe something like:
What kind of advantage should I get running asynchronously?
oh I didn't mean you were suggesting any advantage... just wondering.
It could be an option if once everything is up I want to keep under
control resources sucked by this process (?).
> > So I think the largest cost of the operation will be IO.
> > \copy should be optimised for "raw" data output, but maybe all
> > its advantages get lost once I've to use pipes and adding
> > complexity to filtering.
> Streaming IO is pretty fast, I think you'll be hard pushed to keep
> up with it from PHP and you'll end up CPU bound in no time. Be
> interesting to find out though.
Filtering is currently very simple... I'm building a very simple xml
just queueing constant strings and what comes out of the DB.
But if I had to parse a CSV (split) or just assign names to columns
(and I expect this stuff is going to be adjusted frequently) or use
regexp... I was expecting to waste more human cycles or CPU cycles
than avoiding to rely on optimised IO of \copy (if any).
Most of the operations end up being:
$output=SOMEHEADER;
gzwrite($f1);
gzwrite($f2);
gzwrite($f3);
gzwrite($f4);
while(...) {
$record1='<sometag
someattr>.SOMECONST1.$row['col4'].</sometag><someother
someattr>'.$row['col3'].'</someothertag>';
$record2='<sometag
someattr>.SOMECONST2.$row['col4'].</sometag><someother
someattr>'.$row['col3'].'</someothertag>';
gzwrite($f1);
gzwrite($f2);
gzwrite($f3);
gzwrite($f4);
}
$output=SOMEFOOTER;
gzwrite($f1);
gzwrite($f2);
gzwrite($f3);
gzwrite($f4);
I've the largest table of my DB to be sliced into multiple xml files
that have to be written on disk.
So actually 1) reading the whole table and returning 30% of its
fields 2) writing all these data multiple times.
This is by far the largest write load the server is going to incur
in a day.
But well it may be the largest CPU load it is going to incur in a
day as well considering I've to gzip all the files.
Still I think I've read on this list that compression was going
to be a bottleneck more than IO.
I just did a preliminary test and xml-ing and gzipping 80K records
out of 1M takes less than 2sec.
So maybe I was over concerned.
Anyway I'd like to understand a bit better how IO and memory
consumption is managed once you've cursor vs. plain select and
client drivers in the middle.
> > I was reading about all the php documents and trying to
> > understand how buffers and memory usage works, so I gave a look
> > to MySQL documents too...
>
> Not sure about PG, but the C api pretty much always buffers
> everything in memory first. There was mention of getting control
> of this, but I've got no idea where it got.
buffer *everything*?
thanks
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
From | Date | Subject | |
---|---|---|---|
Next Message | Gerd König | 2009-04-03 12:31:26 | high load on server |
Previous Message | rafalak | 2009-04-03 08:20:33 | Re: slow select in big table |