Re: Using quicksort for every external sort run

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-14 03:31:58
Message-ID: CAMkU=1w5us=5UnJR=AXhe=t+hOV9J_wgEy7SVy21bE2SHzPLAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 13, 2015 at 3:40 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Sat, Dec 12, 2015 at 4:41 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>>>> Also, if I am reading this correctly, when we refill a pool from a
>>>> logical tape we still transform each tuple as it is read from the disk
>>>> format to the memory format. This inflates the size quite a bit, at
>>>> least for single-datum tuples. If we instead just read the disk
>>>> format directly into the pool, and converted them into the in-memory
>>>> format when each tuple came due for the merge heap, would that destroy
>>>> the locality of reference you are seeking to gain?
>>>
>>> Are you talking about alignment?
>>
>> Maybe alignment, but also the size of the SortTuple struct itself,
>> which is not present on tape but is present in memory if I understand
>> correctly.
>>
>> When reading 128kb (32 blocks) worth of in-memory pool, it seems like
>> it only gets to read 16 to 18 blocks of tape to fill them up, in the
>> case of building an index on single column 32-byte random md5 digests.
>> I don't exactly know where all of that space goes, I'm taking an
>> experimentalist approach.
>
> I'm confused.
>
> readtup_datum(), just like every other READTUP() variant, has the new
> function tupproperalloc() as a drop-in replacement for the master
> branch palloc() + USEMEM() calls.

Right, I'm not comparing what your patch does to what the existing
code does. I'm comparing it to what it could be doing. Only call
READTUP when you need to go from the pool to the heap, not when you
need to go from tape to the pool. If you store the data in the pool
the same way they are stored on tape, then we no longer need memtuples
at all. There is already a "mergetupcur" per tape pointing to the
first tuple of the tape, and since they are now stored contiguously
that is all that is needed, once you are done with one tuple the
pointer is left pointing at the next one.

The reason for memtuples is to handle random access. Since we are no
longer doing random access, we no longer need it.

We could free memtuples, re-allocate just enough to form the binary
heap for the N-way merge, and use all the rest of that space (which
could be a significant fraction of work_mem) as part of the new pool.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-12-14 03:34:53 Re: Disabling an index temporarily
Previous Message Craig Ringer 2015-12-14 03:28:51 [PATCH] Logical decoding support for sequence advances