Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-02-03 19:12:42
Message-ID: CAH2-WzkU2xK2dpZ7N8-A1MvuUTTUvhqkfnA+eUtwNwCtgyCJgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> I applied your v2 patch on top of
> 7ac4a389a7dbddaa8b19deb228f0a988e79c5795^ to avoid a conflict. It
> still had a couple of harmless conflicts that I was able to deal with
> (not code, just some header stuff moving around).

You must mean my V7 patch. FWIW, I've resolved the conflicts with
7ac4a389a7dbddaa8b19deb228f0a988e79c5795 in my own private branch, and
have worked through some of the open items that you raised.

> See full results from all permutations attached, but I wanted to
> highlight the measurements from 'textwide', 'u', 'nonu' which show
> interesting 'asc' numbers (data already sorted). The 'mem' column is
> maintenance_work_mem in megabytes. The 'w = 0' column shows the time
> in seconds for parallel_workers = 0. The other 'w = N' columns show
> times with higher parallel_workers settings, represented as speed-up
> relative to the 'w = 0' time.

The thing to keep in mind about testing presorted cases in tuplesort
in general is that we have this weird precheck for presorted input in
our qsort. This is something added by us to the original Bentley &
McIlroy algorithm in 2006. I am very skeptical of this addition, in
general. It tends to have the effect of highly distorting how
effective most optimizations are for presorted cases, which comes up
again and again. It only works when the input is *perfectly*
presorted, but can throw away an enormous amount of work when the last
tuple of input is out of order -- that will throw away all work before
that point (not so bad when you think your main cost is comparisons
rather than memory accesses, but that isn't the case).

Your baseline case can either be made unrealistically fast due to the
fact that you get a perfectly sympathetic case for this optimization,
or unrealistically slow (very CPU bound) due to the fact that you have
that one last tuple out of place. I once said that this last tuple can
act like a discarded banana skin.

There is nothing wrong with the idea of exploiting presortedness, and
to some extent the original algorithm does that (by using insertion
sort), but an optimization along the lines of Timsort's "galloping
mode" (which is what this modification of ours attempts) requires
non-trivial bookkeeping to do right.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-03 19:42:27 Re: Add pgstathashindex() to get hash index table statistics.
Previous Message Pavel Stehule 2017-02-03 19:09:30 Re: Packages: Again