Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeremy Harris <jgh(at)wizmail(dot)org>
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date: 2015-08-03 21:32:40
Message-ID: CAM3SWZRufkF3yH+Ja=T3-gqi6JYs42pJX8J=dLnjwie2U5ZJyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 3, 2015 at 1:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't think that's what Heikki is talking about. He can correct me
> if I'm wrong, but what I think he's saying is that we should try to
> exploit the fact that we've already determined which in-memory tuples
> can be part of the current run and which in-memory tuples must become
> part of the next run. Suppose half the tuples in memory can become
> part of the current run and the other half must become part of the
> next run. If we throw all of those tuples into a single bucket and
> quicksort it, we're losing the benefit of the comparisons we did to
> figure out which tuples go in which runs. Instead, we could quicksort
> the current-run tuples and, separately, quick-sort the next-run
> tuples. Ignore the current-run tuples completely until the tape is
> empty, and then merge them with any next-run tuples that remain.

Oh. Well, the benefit of "the comparisons we did to figure out which
tuples go in which runs" is that we can determine the applicability of
this optimization. Also, by keeping run 1 (if any) usually in memory,
and run 0 partially on disk we avoid having to worry about run 1 as a
thing that spoils the optimization (in the current "single run
optimization", dumping all tuples can make us "acknowledge" run 1
(i.e. currentRun++), preventing single run optimization, which we
handily avoid in the patch). Finally, it saves us a bunch of real
COMPARETUP() comparisons as HEAPCOMPARE() is called as tuples are
inserted into the still-heapified memtuples array.

> I'm not sure if there's any reason to believe that would be faster
> than your approach. In general, sorting is O(n lg n) so sorting two
> arrays that are each half as large figures to be slightly faster than
> sorting one big array. But the difference may not amount to much.

IMV, the smart way of avoiding wasting "the comparisons we did to
figure out which tuples go in which runs" is to rig HEAPCOMPARE() to
only do a COMPARETUP() for the currentRun, and make sure that we don't
mess up and forget that if we don't end up quicksorting.

The second run that is in memory can only consist of whatever tuples
were added after heapification that were less than any of those
previously observed tuples (a big majority, usually). So like you, I
can't see any of these techniques helping much, even my "smart"
technique. Maybe I should look at a case involving text or something
to be sure.

Thinking about it some more, I don't think it would be easy to
maintain a clear separation between run 0 and run 1 in the memtuples
array in terms of a cutoff point. It's still a heap at that stage, of
course. You'd have to rig each tuple comparator so that COMPARETUP()
cared about tupindex before comparing datum1 just for this, which
seems rather unappealing.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-08-03 22:07:26 Re: Arguable RLS security bug, EvalPlanQual() paranoia
Previous Message Jim Nasby 2015-08-03 20:55:06 Re: Minimum tuple threshold to decide last pass of VACUUM