From: | Jim Nasby <jim(at)nasby(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory usage during sorting |
Date: | 2012-03-20 20:26:31 |
Message-ID: | 4F68E7F7.6080004@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/18/12 10:25 AM, Tom Lane wrote:
> Jeff Janes<jeff(dot)janes(at)gmail(dot)com> writes:
>> > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes<jeff(dot)janes(at)gmail(dot)com> wrote:
>>>> >>> Anyway, I think the logtape could use redoing.
>> > The problem there is that none of the files can be deleted until it
>> > was entirely read, so you end up with all the data on disk twice. I
>> > don't know how often people run their databases so close to the edge
>> > on disk space that this matters, but someone felt that that extra
>> > storage was worth avoiding.
> Yeah, that was me, and it came out of actual user complaints ten or more
> years back. (It's actually not 2X growth but more like 4X growth
> according to the comments in logtape.c, though I no longer remember the
> exact reasons why.) We knew when we put in the logtape logic that we
> were trading off speed for space, and we accepted that. It's possible
> that with the growth of hard drive sizes, real-world applications would
> no longer care that much about whether the space required to sort is 4X
> data size rather than 1X. Or then again, maybe their data has grown
> just as fast and they still care.
>
I believe the case of tape sorts that fit entirely in filesystem cache is a big one as well... doubling or worse the amount of data that needed to live "on disk" at once would likely suck in that case.
Also, it's not uncommon to be IO-bound on a database server... so even if we're not worried about storing everything 2 or more times from a disk space standpoint, we should be concerned about the IO bandwidth.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-03-20 21:06:45 | Re: Memory usage during sorting |
Previous Message | Alvaro Herrera | 2012-03-20 20:16:17 | Re: Error trying to compile a simple C trigger |