| From: | Andreas Pflug <pgadmin(at)pse-consulting(dot)de> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Inefficient bytea escaping? | 
| Date: | 2006-05-25 18:37:16 | 
| Message-ID: | 4475F95C.8080307@pse-consulting.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> 
>>When dumping the table with psql \copy (non-binary), the resulting file 
>>would be 6.6GB of size, taking about 5.5 minutes. Using psql \copy WITH 
>>BINARY (modified psql as posted to -patches), the time was cut down to 
>>21-22 seconds (filesize 1.4GB as expected), which is near the physical 
>>throughput of the target disk. If server based COPY to file is used, The 
>>same factor 12 can be observed, CPU is up to 100 % (single P4 3GHz 2MB 
>>Cache HT disabled, 1GB main mem).
> 
> 
> This is with an 8.0.x server, right?
I've tested both 8.0.5 and 8.1.4, no difference observed.
> Testing a similar case with CVS HEAD, I see about a 5x speed difference,
> which is right in line with the difference in the physical amount of
> data written.
That's what I would have expected, apparently the data is near worst case.
   (I was testing a case where all the bytes were emitted as
> '\nnn', so it's the worst case.)  oprofile says the time is being spent
> in CopyAttributeOutText() and fwrite().  So I don't think there's
> anything to be optimized here, as far as bytea goes: its binary
> representation is just inherently a lot smaller.
Unfortunately, binary isn't the cure for all, since copying normal data 
with binary option might bloat that by factor two or so. I wish there 
was a third option that's fine for both kinds of data. That's not only a 
question of dump file sizes, but also of network throughput (an online 
compression in the line protocol would be desirable for this).
> Looking at CopySendData, I wonder whether any traction could be gained
> by trying not to call fwrite() once per character.  I'm not sure how
> much per-call overhead there is in that function.  We've done a lot of
> work trying to optimize the COPY IN path since 8.0, but nothing much
> on COPY OUT ...
Hm, I'll see whether I can manage to check CVS head too, and see what's 
happening, not a production alternative though.
Regards,
Andreas
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2006-05-25 18:46:37 | Re: Inefficient bytea escaping? | 
| Previous Message | Joshua D. Drake | 2006-05-25 18:24:27 | 8.1.4 build failure on ICC 9.1 |