From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: Using quicksort for every external sort run |
Date: | 2015-12-07 00:04:05 |
Message-ID: | CAM3SWZQHJS=Bg9esOZo7V3nXTjEiAMyjVmjJyk-feLz13m6b8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Dec 6, 2015 at 3:59 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> My column has the format of ABC-123-456-789-0
>
> The name-space identifier ("ABC-") is the same in 99.99% of the
> cases. And to date, as well as in my extrapolation, the first two
> digits of the numeric part are leading zeros and the third one is
> mostly 0,1,2. So the first 8 bytes really have less than 2 bits worth
> of information. So yeah, not surprising abbreviation was not useful.
I think that given you're using the "C" collation, abbreviation should
still go ahead. I posted a patch to do that, which I need to further
justify per Robert's request (currently, we do nothing special based
on collation). Abbreviation should help in surprisingly marginal
cases, since far fewer memory accesses will be required in the early
stages of the sort with only (say) 5 distinct abbreviated keys. Once
abbreviated comparisons start to not help at all (with quicksort, at
some partition), there's a good chance that the full keys can be
reused to some extent, before being evicted from CPU caches.
>> BTW, roughly what does this CREATE INDEX look like? Is it a composite
>> index, for example?
>
> Nope, just a single column index. In the extrapolated data set, each
> distinct value shows up a couple hundred times on average. I'm
> thinking of converting it to a btree_gin index once I've tested them a
> bit more, as the compression benefits are substantial.
Unfortunately, that cannot use tuplesort.c at all.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2015-12-07 00:25:23 | Re: Using quicksort for every external sort run |
Previous Message | Jeff Janes | 2015-12-06 23:59:25 | Re: Using quicksort for every external sort run |