From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Question about maxTapes & selectnewtape & dumptuples |
Date: | 2024-06-30 10:22:21 |
Message-ID: | 72fe610e-ecfb-48d4-b966-083c5bd20c9e@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 30/06/2024 12:48, Andy Fan wrote:
> merge sorts requires all the tuples in each input are pre-sorted (1),
> and in tuplesort.c, when the workmem is full, we dumptuples into
> destTape. (2). it looks to me that we need a unlimited number of Tapes
> if the number of input tuples is big enough. However we set a maxTapes
> in 'inittapes' and we may use a pre-used tapes for storing new tuples in
> 'selectnewtape'. Wouldn't this break the prerequisite (1)?
>
> selectnewtape(Tuplesortstate *state)
> {
> if (state->nOutputTapes < state->maxTapes)
> {..}
> else
> {
> /*
> * We have reached the max number of tapes. Append to an existing
> * tape.
> */
> state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes];
> state->nOutputRuns++;
> }
> }
>
>
> for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7),
> and later (2, 4, 6, 8) are put into it. so the overall of (1, 3, 5, 7,
> 2, 4, 6, 8) are not sorted? Where did I go wrong?
There's a distinction between "runs" and "tapes". Each "run" is sorted,
but there can be multiple runs on a tape. In that case, multiple merge
passes are needed. Note the call to markrunend() between the runs. In
your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4,
6, 8).
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-06-30 10:22:28 | Re: Optimize numeric.c mul_var() using the Karatsuba algorithm |
Previous Message | Andy Fan | 2024-06-30 09:48:44 | Question about maxTapes & selectnewtape & dumptuples |