Re: Using quicksort for every external sort run

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-06 23:59:25
Message-ID: CAMkU=1x29QJV-5USH3mS_J8ztZL5J5x0_KLku2QgkDOCiV0cOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 29, 2015 at 2:01 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> So there was 27.76 seconds spent copying tuples into local memory
>> ahead of the quicksort, 2 minutes 56.68 seconds spent actually
>> quicksorting, and a trifling 10.32 seconds actually writing the run! I
>> bet that the quicksort really didn't use up too much memory bandwidth
>> on the system as a whole, since abbreviated keys are used with a cache
>> oblivious internal sorting algorithm.
>
> Uh, actually, that isn't so:
>
> LOG: begin index sort: unique = f, workMem = 1048576, randomAccess = f
> LOG: bttext_abbrev: abbrev_distinct after 160: 1.000489
> (key_distinct: 40.802210, norm_abbrev_card: 0.006253, prop_card:
> 0.200000)
> LOG: bttext_abbrev: aborted abbreviation at 160 (abbrev_distinct:
> 1.000489, key_distinct: 40.802210, prop_card: 0.200000)
>
> Abbreviation is aborted in all cases that you tested. Arguably this
> should happen significantly less frequently with the "C" locale,
> possibly almost never, but it makes this case less than representative
> of most people's workloads. I think that at least the first several
> hundred leading attribute tuples are duplicates.

I guess I wasn't paying sufficient attention to that part of
trace_sort, I was not familiar enough with the abbreviation feature to
interpret what it meant. I had thought we used 16 bytes for
abbreviation, but now I see it is only 8 bytes.

My column has the format of ABC-123-456-789-0

The name-space identifier ("ABC-") is the same in 99.99% of the
cases. And to date, as well as in my extrapolation, the first two
digits of the numeric part are leading zeros and the third one is
mostly 0,1,2. So the first 8 bytes really have less than 2 bits worth
of information. So yeah, not surprising abbreviation was not useful.

(When I created the system, I did tests that showed it doesn't make
much difference whether I used the format natively, or stripped it to
something more compact on input and reformatted it on output. That
was before abbreviation features existed)

>
> BTW, roughly what does this CREATE INDEX look like? Is it a composite
> index, for example?

Nope, just a single column index. In the extrapolated data set, each
distinct value shows up a couple hundred times on average. I'm
thinking of converting it to a btree_gin index once I've tested them a
bit more, as the compression benefits are substantial.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-12-07 00:04:05 Re: Using quicksort for every external sort run
Previous Message Tomas Vondra 2015-12-06 22:48:51 Re: PATCH: index-only scans with partial indexes