From: | Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> |
---|---|
To: | pgsql-committers(at)postgresql(dot)org |
Subject: | pgsql: Use radix tree for character encoding conversions. |
Date: | 2017-03-13 18:47:23 |
Message-ID: | E1cnV07-0007li-6D@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Use radix tree for character encoding conversions.
Replace the mapping tables used to convert between UTF-8 and other
character encodings with new radix tree-based maps. Looking up an entry in
a radix tree is much faster than a binary search in the old maps. As a
bonus, the radix tree representation is also more compact, making the
binaries slightly smaller.
The "combined" maps work the same as before, with binary search. They are
much smaller than the main tables, so it doesn't matter so much. However,
the "combined" maps are now stored in the same .map files as the main
tables. This seems more clear, since they're always used together, and
generated from the same source files.
Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages.
Reviewed by Michael Paquier and Daniel Gustafsson.
Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/aeed17d00037950a16cc5ebad5b5592e5fa1ad0f
Modified Files
--------------
src/backend/utils/mb/Unicode/Makefile | 10 +-
src/backend/utils/mb/Unicode/UCS_to_BIG5.pl | 12 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl | 10 +-
.../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl | 22 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl | 189 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl | 14 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl | 10 +-
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl | 10 +-
src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl | 12 +-
.../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl | 21 +-
src/backend/utils/mb/Unicode/UCS_to_SJIS.pl | 32 +-
src/backend/utils/mb/Unicode/UCS_to_UHC.pl | 12 +-
src/backend/utils/mb/Unicode/UCS_to_most.pl | 6 +-
src/backend/utils/mb/Unicode/big5_to_utf8.map | 18321 ++------
src/backend/utils/mb/Unicode/convutils.pm | 806 +-
src/backend/utils/mb/Unicode/euc_cn_to_utf8.map | 9723 +----
.../utils/mb/Unicode/euc_jis_2004_to_utf8.map | 14744 ++-----
.../mb/Unicode/euc_jis_2004_to_utf8_combined.map | 29 -
src/backend/utils/mb/Unicode/euc_jp_to_utf8.map | 17337 ++------
src/backend/utils/mb/Unicode/euc_kr_to_utf8.map | 10723 ++---
src/backend/utils/mb/Unicode/euc_tw_to_utf8.map | 31407 ++++----------
src/backend/utils/mb/Unicode/gb18030_to_utf8.map | 41882 +++++--------------
src/backend/utils/mb/Unicode/gbk_to_utf8.map | 28344 +++----------
.../utils/mb/Unicode/iso8859_10_to_utf8.map | 237 +-
.../utils/mb/Unicode/iso8859_13_to_utf8.map | 237 +-
.../utils/mb/Unicode/iso8859_14_to_utf8.map | 237 +-
.../utils/mb/Unicode/iso8859_15_to_utf8.map | 237 +-
.../utils/mb/Unicode/iso8859_16_to_utf8.map | 237 +-
src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map | 205 +-
src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map | 198 +-
src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map | 205 +-
src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map | 237 +-
src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map | 158 +-
src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map | 234 +-
src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map | 201 +-
src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map | 205 +-
src/backend/utils/mb/Unicode/johab_to_utf8.map | 23327 +++--------
src/backend/utils/mb/Unicode/koi8r_to_utf8.map | 237 +-
src/backend/utils/mb/Unicode/koi8u_to_utf8.map | 237 +-
.../utils/mb/Unicode/shift_jis_2004_to_utf8.map | 14503 ++-----
.../mb/Unicode/shift_jis_2004_to_utf8_combined.map | 29 -
src/backend/utils/mb/Unicode/sjis_to_utf8.map | 10202 ++---
src/backend/utils/mb/Unicode/uhc_to_utf8.map | 23788 +++--------
src/backend/utils/mb/Unicode/utf8_to_big5.map | 17809 ++------
src/backend/utils/mb/Unicode/utf8_to_euc_cn.map | 11487 ++---
.../utils/mb/Unicode/utf8_to_euc_jis_2004.map | 23868 ++++++-----
.../mb/Unicode/utf8_to_euc_jis_2004_combined.map | 29 -
src/backend/utils/mb/Unicode/utf8_to_euc_jp.map | 20314 ++++-----
src/backend/utils/mb/Unicode/utf8_to_euc_kr.map | 14617 +++----
src/backend/utils/mb/Unicode/utf8_to_euc_tw.map | 24574 +++--------
src/backend/utils/mb/Unicode/utf8_to_gb18030.map | 40292 +++++-------------
src/backend/utils/mb/Unicode/utf8_to_gbk.map | 26061 ++----------
.../utils/mb/Unicode/utf8_to_iso8859_10.map | 240 +-
.../utils/mb/Unicode/utf8_to_iso8859_13.map | 239 +-
.../utils/mb/Unicode/utf8_to_iso8859_14.map | 272 +-
.../utils/mb/Unicode/utf8_to_iso8859_15.map | 227 +-
.../utils/mb/Unicode/utf8_to_iso8859_16.map | 257 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_2.map | 240 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_3.map | 232 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_4.map | 240 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_5.map | 229 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_6.map | 171 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_7.map | 248 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_8.map | 194 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_9.map | 226 +-
src/backend/utils/mb/Unicode/utf8_to_johab.map | 23380 +++--------
src/backend/utils/mb/Unicode/utf8_to_koi8r.map | 301 +-
src/backend/utils/mb/Unicode/utf8_to_koi8u.map | 312 +-
.../utils/mb/Unicode/utf8_to_shift_jis_2004.map | 18954 ++++-----
.../mb/Unicode/utf8_to_shift_jis_2004_combined.map | 29 -
src/backend/utils/mb/Unicode/utf8_to_sjis.map | 11648 ++----
src/backend/utils/mb/Unicode/utf8_to_uhc.map | 23612 +++--------
src/backend/utils/mb/Unicode/utf8_to_win1250.map | 266 +-
src/backend/utils/mb/Unicode/utf8_to_win1251.map | 259 +-
src/backend/utils/mb/Unicode/utf8_to_win1252.map | 267 +-
src/backend/utils/mb/Unicode/utf8_to_win1253.map | 244 +-
src/backend/utils/mb/Unicode/utf8_to_win1254.map | 276 +-
src/backend/utils/mb/Unicode/utf8_to_win1255.map | 260 +-
src/backend/utils/mb/Unicode/utf8_to_win1256.map | 320 +-
src/backend/utils/mb/Unicode/utf8_to_win1257.map | 259 +-
src/backend/utils/mb/Unicode/utf8_to_win1258.map | 284 +-
src/backend/utils/mb/Unicode/utf8_to_win866.map | 280 +-
src/backend/utils/mb/Unicode/utf8_to_win874.map | 225 +-
src/backend/utils/mb/Unicode/win1250_to_utf8.map | 232 +-
src/backend/utils/mb/Unicode/win1251_to_utf8.map | 236 +-
src/backend/utils/mb/Unicode/win1252_to_utf8.map | 232 +-
src/backend/utils/mb/Unicode/win1253_to_utf8.map | 220 +-
src/backend/utils/mb/Unicode/win1254_to_utf8.map | 230 +-
src/backend/utils/mb/Unicode/win1255_to_utf8.map | 214 +-
src/backend/utils/mb/Unicode/win1256_to_utf8.map | 237 +-
src/backend/utils/mb/Unicode/win1257_to_utf8.map | 225 +-
src/backend/utils/mb/Unicode/win1258_to_utf8.map | 228 +-
src/backend/utils/mb/Unicode/win866_to_utf8.map | 237 +-
src/backend/utils/mb/Unicode/win874_to_utf8.map | 204 +-
src/backend/utils/mb/conv.c | 251 +-
.../conversion_procs/utf8_and_big5/utf8_and_big5.c | 4 +-
.../utf8_and_cyrillic/utf8_and_cyrillic.c | 8 +-
.../utf8_and_euc2004/utf8_and_euc2004.c | 6 +-
.../utf8_and_euc_cn/utf8_and_euc_cn.c | 4 +-
.../utf8_and_euc_jp/utf8_and_euc_jp.c | 4 +-
.../utf8_and_euc_kr/utf8_and_euc_kr.c | 4 +-
.../utf8_and_euc_tw/utf8_and_euc_tw.c | 4 +-
.../utf8_and_gb18030/utf8_and_gb18030.c | 4 +-
.../conversion_procs/utf8_and_gbk/utf8_and_gbk.c | 4 +-
.../utf8_and_iso8859/utf8_and_iso8859.c | 75 +-
.../utf8_and_johab/utf8_and_johab.c | 4 +-
.../conversion_procs/utf8_and_sjis/utf8_and_sjis.c | 4 +-
.../utf8_and_sjis2004/utf8_and_sjis2004.c | 6 +-
.../conversion_procs/utf8_and_uhc/utf8_and_uhc.c | 4 +-
.../conversion_procs/utf8_and_win/utf8_and_win.c | 54 +-
src/include/mb/pg_wchar.h | 84 +-
111 files changed, 147742 insertions(+), 367346 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-03-13 19:44:14 | pgsql: Change xlog to WAL in some error messages |
Previous Message | Heikki Linnakangas | 2017-03-13 17:08:38 | pgsql: Remove obsolete references to JIS0201.TXT JIS0208.TXT. |