Re: Using quicksort for every external sort run

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2016-01-30 07:27:21
Message-ID: CAM-w4HORqMqS1R3XSW_7gS6=9RXe=xovj5SWwXnPXJR2Mt7niQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 Jan 2016 11:58 pm, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> It
> seems pretty easy to construct cases where this technique regresses,
> and a large percentage of those cases are precisely those where
> replacement selection would have produced a single run, avoiding the
> merge step altogether.

Now that avoiding the merge phase altogether didn't necessarily represent
any actual advantage.

We don't find out we've avoided the merge phase until the entire run has
been spiked to disk. Then we need to read it back in from disk to serve up
those tuples.

If we have tapes to merge but can do then in a single pass we do that
lazily and merge as needed when we serve up the tuples. I doubt there's any
speed difference in reading two sequential streams with our buffering over
one especially in the midst of a quiet doing other i/o. And N extra
comparisons is less than the quicksort advantage.

If we could somehow predict that it'll be a single output run that would be
a huge advantage. But having to spill all the tuples and then find out
isn't really helpful.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-01-30 07:29:14 Re: Using quicksort for every external sort run
Previous Message Peter Geoghegan 2016-01-30 07:25:38 Re: Using quicksort for every external sort run