From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Abbreviated keys for Numeric |
Date: | 2015-02-21 05:18:17 |
Message-ID: | 54E81519.308@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 21.2.2015 02:06, Tomas Vondra wrote:
> On 21.2.2015 02:00, Andrew Gierth wrote:
>>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>
>> >> Right...so don't test a datum sort case, since that isn't supported
>> >> at all in the master branch. Your test case is invalid for that
>> >> reason.
>>
>> Tomas> What do you mean by 'Datum sort case'?
>>
>> A case where the code path goes via tuplesort_begin_datum rather than
>> tuplesort_begin_heap.
>>
>> Tomas> The test I was using is this:
>>
>> Tomas> select percentile_disc(0) within group (order by randnum) from stuff;
>>
>> Sorting single columns in aggregate calls uses the Datum sort path (in
>> fact I think it's currently the only place that does).
>>
>> Do that test with _both_ the Datum and Numeric sort patches in place,
>> and you will see the effect. With only the Numeric patch, the numeric
>> abbrev code is not called.
>
> D'oh! Thanks for the explanation.
OK, so I've repeated the benchmarks with both patches applied, and I
think the results are interesting. I extended the benchmark a bit - see
the SQL script attached.
1) multiple queries
select percentile_disc(0) within group (order by val) from stuff
select count(distinct val) from stuff
select * from
(select * from stuff order by val offset 100000000000) foo
2) multiple data types - int, float, text and numeric
3) multiple scales - 1M, 2M, 3M, 4M and 5M rows
Each query was executed 10x, the timings were averaged. I do know some
of the data types don't benefit from the patches, but I included them to
get a sense of how noisy the results are.
I did the measurements for
1) master
2) master + datum_sort_abbrev.patch
3) master + datum_sort_abbrev.patch + numeric_sortsup.patch
and then computed the speedup for each type/scale combination (the
impact on all the queries is almost exactly the same).
Complete results are available here: http://bit.ly/1EA4mR9
I'll post all the summary here, although some of the numbers are about
the other abbreviated keys patch.
1) datum_sort_abbrev.patch vs. master
scale float int numeric text
---------------------------------------------
1 101% 99% 105% 404%
2 101% 98% 96% 98%
3 101% 101% 99% 97%
4 100% 101% 98% 95%
5 99% 98% 93% 95%
2) numeric_sortsup.patch vs. master
scale float int numeric text
---------------------------------------------
1 97% 98% 374% 396%
2 100% 101% 407% 96%
3 99% 102% 407% 95%
4 99% 101% 423% 92%
5 95% 99% 411% 92%
I think the gains are pretty awesome - I mean, 400% speedup for Numeric
accross the board? Yes please!
The gains for text are also very nice, although in this case that only
happens for the smallest scale (1M rows), and for larger scales it's
actually slower than current master :-(
It's not just rainbows and unicorns, though. With both patches applied,
text sorts get even slower (up to ~8% slower than master), It also seems
to impact float (which gets ~5% slower, for some reason), but I don't
see how that could happen ... but I suspect this might be noise.
I'll repeat the tests on another machine after the weekend, and post an
update whether the results are the same or significantly different.
regards
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
bench.sh | application/x-shellscript | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Gavin Flower | 2015-02-21 05:35:34 | Re: Abbreviated keys for Numeric |
Previous Message | Petr Jelinek | 2015-02-21 03:26:56 | Re: Bootstrap DATA is a pita |