From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie>, Kefan Yang <starordust(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: GSOC 2018 Project - A New Sorting Routine |
Date: | 2018-07-14 21:20:41 |
Message-ID: | 00147235-69d3-d4f0-36c1-198d630af7ad@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/14/2018 12:10 AM, Peter Geoghegan wrote:
> On Fri, Jul 13, 2018 at 3:04 PM, Kefan Yang <starordust(at)gmail(dot)com> wrote:
>> 1. Slow on CREATE INDEX cases.
>>
>> I am still trying to figure out where the bottleneck is. Is the data pattern
>> in index creation very different from other cases? Also, pg_qsort has
>> 10%-20% advantage at creating index even on sorted data (faster CPU, N =
>> 1000000). This is very strange to me since the two sorting routines execute
>> exactly the same code when the input data is sorted.
>
> Yes. CREATE INDEX uses heap TID as a tie-breaker, so it's impossible
> for any two index tuples to compare as equal within tuplesort.c, even
> though they may be equal in other contexts. This is likely to defeat
> things like the Bentley-McIlroy optimization where equal keys are
> swapped, which is very effective in the event of many equal keys.
>
> (Could also be parallelism, though I suppose you probably accounted for that.)
>
Hmmm. Those scripts are older than max_parallel_maintenance_workers, so
were only setting the regular max_parallel_workers_per_gather GUCs. OTOH
these tests were done on fairly small data sets, starting from 10k rows
and the 10-20% regression is clearly visible for all scales (we don't
use parallel CREATE INDEX for tiny tables, right?). And it's not visible
on the i5 CPU at all, which would be a bit strange if it's
parallelism-related.
So I doubt it's this, but I've tweaked the scripts to also set this GUC
and restarted the tests on both machines. Let's see what that does.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2018-07-14 23:38:12 | Re: [HACKERS] plpgsql - additional extra checks |
Previous Message | Tom Lane | 2018-07-14 18:16:45 | YA race condition in 001_stream_rep.pl (was Re: pgsql: Allow using the updated tuple while moving it to a different par) |