Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-09-07 05:51:05
Message-ID: e8f44b63-4745-b855-7772-e8201906a4a1@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/07/2016 12:46 AM, Peter Geoghegan wrote:
> On Tue, Sep 6, 2016 at 12:34 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Why do we reserve the buffer space for all the tapes right at the beginning?
>> Instead of the single USEMEM(maxTapes * TAPE_BUFFER_OVERHEAD) callin
>> inittapes(), couldn't we call USEMEM(TAPE_BUFFER_OVERHEAD) every time we
>> start a new run, until we reach maxTapes?
>
> No, because then you have no way to clamp back memory, which is now
> almost all used (we hold off from making LACKMEM() continually true,
> if at all possible, which is almost always the case). You can't really
> continually shrink memtuples to make space for new tapes, which is
> what it would take.

I still don't get it. When building the initial runs, we don't need
buffer space for maxTapes yet, because we're only writing to a single
tape at a time. An unused tape shouldn't take much memory. In
inittapes(), when we have built all the runs, we know how many tapes we
actually needed, and we can allocate the buffer memory accordingly.

[thinks a bit, looks at logtape.c]. Hmm, I guess that's wrong, because
of the way this all is implemented. When we're building the initial
runs, we're only writing to one tape at a time, but logtape.c
nevertheless holds onto a BLCKSZ'd currentBuffer, plus one buffer for
each indirect level, for every tape that has been used so far. What if
we changed LogicalTapeRewind to free those buffers? Flush out the
indirect buffers to disk, remembering just the physical block number of
the topmost indirect block in memory, and free currentBuffer. That way,
a tape that has been used, but isn't being read or written to at the
moment, would take very little memory, and we wouldn't need to reserve
space for them in the build-runs phase.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-07 05:57:53 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Pavel Stehule 2016-09-07 05:49:35 Re: patch: function xmltable