Quick Links

Re: Tuplesort merge pre-reading

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Tuplesort merge pre-reading
Date:	2016-09-28 18:12:26
Message-ID:	de211a24-edda-d8b3-567e-a1610eb721c6@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 09/28/2016 07:11 PM, Peter Geoghegan wrote:
> On Wed, Sep 28, 2016 at 5:04 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> Not sure that I understand. I agree that each merge pass tends to use
>>> roughly the same number of tapes, but the distribution of real runs on
>>> tapes is quite unbalanced in earlier merge passes (due to dummy runs).
>>> It looks like you're always using batch memory, even for non-final
>>> merges. Won't that fail to be in balance much of the time because of
>>> the lopsided distribution of runs? Tapes have an uneven amount of real
>>> data in earlier merge passes.
>>
>>
>> How does the distribution of the runs on the tapes matter?
>
> The exact details are not really relevant to this discussion (I think
> it's confusing that we simply say "Target Fibonacci run counts",
> FWIW), but the simple fact that it can be quite uneven is.

Well, I claim that the fact that the distribution of runs is uneven,
does not matter. Can you explain why you think it does?

> This is why I never pursued batch memory for non-final merges. Isn't
> that what you're doing here? You're pretty much always setting
> "state->batchUsed = true".

Yep. As the patch stands, we wouldn't really need batchUsed, as we know
that it's always true when merging, and false otherwise. But I kept it,
as it seems like that might not always be true - we might use batch
memory when building the initial runs, for example - and because it
seems nice to have an explicit flag for it, for readability and
debugging purposes.

>>> I'm basically repeating myself here, but: I think it's incorrect that
>>> LogicalTapeAssignReadBufferSize() is called so indiscriminately (more
>>> generally, it is questionable that it is called in such a high level
>>> routine, rather than the start of a specific merge pass -- I said so a
>>> couple of times already).
>>
>>
>> You can't release the tape buffer at the end of a pass, because the buffer
>> of a tape will already be filled with data from the next run on the same
>> tape.
>
> Okay, but can't you just not use batch memory for non-final merges,
> per my initial approach? That seems far cleaner.

Why? I don't see why the final merge should behave differently from the
non-final ones.

- Heikki

In response to

Re: Tuplesort merge pre-reading at 2016-09-28 16:11:52 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Petr Jelinek	2016-09-28 18:23:24	Re: psql casts aspersions on server reliability
Previous Message	Tom Lane	2016-09-28 18:11:02	Re: Better tracking of free space during SP-GiST index build