Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-20 01:42:51
Message-ID: CAM3SWZQKMQPeCZEe8iPBn+ohAa6TkM2jYY0fBBQK706ji6o+GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2015 at 5:32 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> Hm. Have you tested a nearly-sorted input set around 1.5x the size of
> work_mem? That should produce a single run using the heap to generate
> runs but generate two runs if, AIUI, you're just filling work_mem,
> running quicksort, dumping that run entirely and starting fresh.

Yes. Actually, even with a random ordering, on average replacement
selection sort will produce runs twice as long as the patch series.
With nearly ordered input, there is no limit to how log runs can be --
you could definitely have cases where *no* merge step is required. We
just return tuples from one long run. And yet, it isn't worth it in
cases that I tested.

Please don't take my word for it -- try yourself.
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-11-20 02:00:02 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Andres Freund 2015-11-20 01:42:16 Re: Typo in file header comment of replorigindesc.c