From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> |
Subject: | Re: Rework of collation code, extensibility |
Date: | 2023-01-26 23:47:13 |
Message-ID: | 64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Attached v9 and added perf numbers below.
I'm hoping to commit 0002 and 0003 soon-ish, maybe a week or two,
please let me know if you want me to hold off. (I won't commit the GUCs
unless others find them generally useful; they are included here to
more easily reproduce my performance tests.)
My primary motivation is still related to
https://commitfest.postgresql.org/41/3956/ but the combination of
cleaner code and a performance boost seems like reasonable
justification for this patch set independently.
There aren't any clear open items on this patch. Peter Eisentraut asked
me to focus this thread on the refactoring, which I've done by reducing
it to 2 patches, and I left multilib ICU up to the other thread. He
also questioned the increased line count, but I think the currently-low
line count is due to bad style. PeterG provided some review comments,
in particular when to do the tiebreaking, which I addressed.
This patch has been around for a while, so I'll take a fresh look and
see if I see risk areas, and re-run a few sanity checks. Of course more
feedback would also be welcome.
PERFORMANCE:
======
Setup:
======
base: master with v9-0001 applied (GUCs only)
refactor: master with v9-0001, v9-0002, v9-0003 applied
Note that I wasn't able to see any performance difference between the
base and master, v9-0001 just adds some GUCs to make testing easier.
glibc 2.35 ICU 70.1
gcc 11.3.0 LLVM 14.0.0
built with meson (uses -O3)
$ perl text_generator.pl 10000000 10 > /tmp/strings.utf8.txt
CREATE TABLE s (t TEXT);
COPY s FROM '/tmp/strings.utf8.txt';
VACUUM FREEZE s;
CHECKPOINT;
SET work_mem='10GB';
SET max_parallel_workers = 0;
SET max_parallel_workers_per_gather = 0;
=============
Test queries:
=============
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "C";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en_US";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu";
Timings are measured as the milliseconds to return the first tuple from
the Sort operator (as reported in EXPLAIN ANALYZE). Median of three
runs.
========
Results:
========
base refactor speedup
sort_abbreviated_keys=false:
C 7377 7273 1.4%
en_US 35081 35090 0.0%
en-US-x-ixu 20520 19465 5.4%
sort_abbreviated_keys=true:
C 8105 8008 1.2%
en_US 35067 34850 0.6%
en-US-x-icu 22626 21507 5.2%
===========
Conclusion:
===========
These numbers can move +/-1 percentage point, so I'd interpret anything
less than that as noise. This happens to be the first run where all the
numbers favored the refactoring patch, but it is generally consistent
with what I had seen before.
The important part is that, for ICU, it appears to be a substantial
speedup when using meson (-O3).
Also, when/if the multilib ICU support goes in, that may lose some of
these gains due to an extra indirect function call.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachment | Content-Type | Size |
---|---|---|
text_generator.pl | application/x-perl | 515 bytes |
v9-0001-Introduce-GUCs-to-control-abbreviated-keys-sort-o.patch | text/x-patch | 8.4 KB |
v9-0002-Add-pg_strcoll-pg_strxfrm-and-variants.patch | text/x-patch | 42.1 KB |
v9-0003-Refactor-pg_locale_t-routines.patch | text/x-patch | 43.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2023-01-27 00:07:29 | Partition key causes problem for volatile target list query |
Previous Message | Peter Geoghegan | 2023-01-26 23:36:52 | Re: New strategies for freezing, advancing relfrozenxid early |