Re: Question about maxTapes & selectnewtape & dumptuples

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andy Fan <zhihuifan1213(at)163(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about maxTapes & selectnewtape & dumptuples
Date: 2024-06-30 10:22:21
Message-ID: 72fe610e-ecfb-48d4-b966-083c5bd20c9e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/06/2024 12:48, Andy Fan wrote:
> merge sorts requires all the tuples in each input are pre-sorted (1),
> and in tuplesort.c, when the workmem is full, we dumptuples into
> destTape. (2). it looks to me that we need a unlimited number of Tapes
> if the number of input tuples is big enough. However we set a maxTapes
> in 'inittapes' and we may use a pre-used tapes for storing new tuples in
> 'selectnewtape'. Wouldn't this break the prerequisite (1)?
>
> selectnewtape(Tuplesortstate *state)
> {
> if (state->nOutputTapes < state->maxTapes)
> {..}
> else
> {
> /*
> * We have reached the max number of tapes. Append to an existing
> * tape.
> */
> state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes];
> state->nOutputRuns++;
> }
> }
>
>
> for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7),
> and later (2, 4, 6, 8) are put into it. so the overall of (1, 3, 5, 7,
> 2, 4, 6, 8) are not sorted? Where did I go wrong?

There's a distinction between "runs" and "tapes". Each "run" is sorted,
but there can be multiple runs on a tape. In that case, multiple merge
passes are needed. Note the call to markrunend() between the runs. In
your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4,
6, 8).

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-06-30 10:22:28 Re: Optimize numeric.c mul_var() using the Karatsuba algorithm
Previous Message Andy Fan 2024-06-30 09:48:44 Question about maxTapes & selectnewtape & dumptuples