Quick Links

Re: Re: Abbreviated keys for Datum tuplesort

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Abbreviated keys for Datum tuplesort
Date:	2015-02-20 20:57:00
Message-ID:	54E79F9C.4090208@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 25.1.2015 12:15, Andrew Gierth wrote:
>
> So given some suitable test data, such as
>
> create table stuff as select random()::text as randtext
> from generate_series(1,1000000); -- or however many rows
>
> you can do
>
> select percentile_disc(0) within group (order by randtext) from stuff;
>
> or
>
> select count(distinct randtext) from stuff;
>
> The performance improvements I saw were pretty much exactly as
> expected from the improvement in the ORDER BY and CREATE INDEX cases.

I've spent a fair amount of testing this today, and when using the
simple percentile_disc example mentioned above, I see this pattern:

master patched speedup
---------------------------------------------------------
generate_series(1,1000000) 4.2 0.7 6
generate_series(1,2000000) 9.2 9.8 0.93
generate_series(1,3000000) 14.5 15.3 0.95

so for a small dataset the speedup is very nice, but for larger sets
there's ~5% slowdown. Is this expected?

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Re: Abbreviated keys for Datum tuplesort at 2015-01-25 11:15:12 from Andrew Gierth

Responses

Re: Re: Abbreviated keys for Datum tuplesort at 2015-04-02 19:17:15 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2015-02-20 21:01:17	Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
Previous Message	Alvaro Herrera	2015-02-20 20:55:20	Re: POLA violation with \c service=