From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Tuplesort merge pre-reading |
Date: | 2016-09-28 16:04:48 |
Message-ID: | 0c0b80fc-9dea-c031-ce51-2781edefad4d@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/28/2016 06:05 PM, Peter Geoghegan wrote:
> On Thu, Sep 15, 2016 at 9:51 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I don't think it makes much difference in practice, because most merge
>> passes use all, or almost all, of the available tapes. BTW, I think the
>> polyphase algorithm prefers to do all the merges that don't use all tapes
>> upfront, so that the last final merge always uses all the tapes. I'm not
>> 100% sure about that, but that's my understanding of the algorithm, and
>> that's what I've seen in my testing.
>
> Not sure that I understand. I agree that each merge pass tends to use
> roughly the same number of tapes, but the distribution of real runs on
> tapes is quite unbalanced in earlier merge passes (due to dummy runs).
> It looks like you're always using batch memory, even for non-final
> merges. Won't that fail to be in balance much of the time because of
> the lopsided distribution of runs? Tapes have an uneven amount of real
> data in earlier merge passes.
How does the distribution of the runs on the tapes matter?
>> + usedBlocks = 0;
>> + for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
>> + {
>> + int64 numBlocks = blocksPerTape + (tapenum < remainder ? 1 : 0);
>> +
>> + if (numBlocks > MaxAllocSize / BLCKSZ)
>> + numBlocks = MaxAllocSize / BLCKSZ;
>> + LogicalTapeAssignReadBufferSize(state->tapeset, tapenum,
>> + numBlocks * BLCKSZ);
>> + usedBlocks += numBlocks;
>> + }
>> + USEMEM(state, usedBlocks * BLCKSZ);
>
> I'm basically repeating myself here, but: I think it's incorrect that
> LogicalTapeAssignReadBufferSize() is called so indiscriminately (more
> generally, it is questionable that it is called in such a high level
> routine, rather than the start of a specific merge pass -- I said so a
> couple of times already).
You can't release the tape buffer at the end of a pass, because the
buffer of a tape will already be filled with data from the next run on
the same tape.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2016-09-28 16:11:52 | Re: Tuplesort merge pre-reading |
Previous Message | Peter Eisentraut | 2016-09-28 15:55:48 | compiler warning read_objtype_from_string() |