From: | Achilleas Mantzios - cloud <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Slow performance of collate "en_US.utf8" |
Date: | 2025-02-28 12:59:38 |
Message-ID: | 82b7a507-2944-447c-b762-ec65dc2e0047@cloud.gatewaynet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Alexey
On 2/27/25 15:54, Alexey Borschev wrote:
>
> Hi everyone!
>
> I see poor performance of text sorting of collate "en_US.utf8" in PG 17.4.
>
> Test query:
>
> explain (analyze, costs, buffers, verbose)
>
> SELECT
>
> ('БвЁжЫйяЙSёYz&$' || gen.id) collate "en_US.utf8"
>
> FROM generate_series(1, 10000) AS gen(id)
>
> order by 1 desc;
>
> I've got execution time like:
>
> Execution Time: 73.068 ms
>
> Same poor result with ru_RU.UTF8.
>
> With other collations time is much better:
>
> explain (analyze, costs, buffers, verbose)
>
> select
>
> ('БвЁжЫйяЙSёYz&$' || gen.id) collate "C"
>
> from generate_series(1, 10000) AS gen(id)
>
> order by 1 desc;
>
> Execution Time: 4.792 ms
>
> explain (analyze, costs, buffers, verbose)
>
> SELECT
>
> ('БвЁжЫйяЙSёYz&$' || gen.id) collate "C.utf8"
>
> FROM generate_series(1, 10000) AS gen(id)
>
> order by 1 desc;
>
> Execution Time: 7.473 ms
>
> explain (analyze, costs, buffers, verbose)
>
> select
>
> ('БвЁжЫйяЙSёYz&$' || gen.id) collate "und-x-icu"
>
> from generate_series(1, 10000) AS gen(id)
>
> order by 1 desc;
>
> Execution Time: 13.282 ms
>
> Yes, collate C is fastest, ICU collations is ~ 2 slower then C, but
> "en_US.utf8" is ~ 10x slower!
>
> I suspect it is some performance issue over there.
>
> Can someone of PG hackers reproduce this please?
>
> 1) This PG17 instance was installed with default options, and initdb
> got en_US.utf8 as system default collation and created PG cluster with it.
>
> It seems like most PG databases are created this way with en_US.utf8
> by default.
>
It seems you initialized the cluster with libc as the locale provider.
Have you tried with icu ?
>
> 2) Typical text\varchar columns are created with DB default en_US.utf8
> and performs poor.
>
> explain (analyze, costs, buffers, verbose)
>
> select
>
> ('БвЁжЫйяЙSёYz&$' || gen.id)
>
> from generate_series(1, 10000) AS gen(id)
>
> order by 1 desc;
>
> Execution Time: 73.600 ms
>
> 3) The index search operations are also slower with en_US.utf8, but
> the difference is not as high.
>
> Please see attached file with test table and indexes, with tests on
> index performance.
>
> System Details:
>
> postgres(at)borschev-pg-copydb1:~$ uname -a Linux borschev-pg-copydb1
> 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)
> x86_64 GNU/Linux
>
> postgres(at)borschev-pg-copydb1:~$ cat /etc/issue Debian GNU/Linux
> trixie/sid \n \l
>
> select version();
>
> PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu,
> compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
>
> SELECT * FROM pg_config();
>
> |name |setting |
>
> |-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
>
> |BINDIR |/usr/lib/postgresql/17/bin |
>
> |DOCDIR |/usr/share/doc/postgresql-doc-17 |
>
> |HTMLDIR |/usr/share/doc/postgresql-doc-17 |
>
> |INCLUDEDIR |/usr/include/postgresql |
>
> |PKGINCLUDEDIR |/usr/include/postgresql |
>
> |INCLUDEDIR-SERVER|/usr/include/postgresql/17/server |
>
> |LIBDIR |/usr/lib/x86_64-linux-gnu |
>
> |PKGLIBDIR |/usr/lib/postgresql/17/lib |
>
> |LOCALEDIR |/usr/share/locale |
>
> |MANDIR |/usr/share/postgresql/17/man |
>
> |SHAREDIR |/usr/share/postgresql/17 |
>
> |SYSCONFDIR |/etc/postgresql-common |
>
> |PGXS |/usr/lib/postgresql/17/lib/pgxs/src/makefiles/pgxs.mk |
>
> |CONFIGURE | '--build=x86_64-linux-gnu' '--prefix=/usr'
> '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
> '--infodir=${prefix}/share/info' '--sysconfdir=/etc'
> '--localstatedir=/var' '--disable-option-checking'
> '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu'
> '--runstatedir=/run' '--disable-maintainer-mode'
> '--disable-dependency-tracking' '--with-tcl' '--with-perl'
> '--with-python' '--with-pam' '--with-openssl' '--with-libxml'
> '--with-libxslt' '--mandir=/usr/share/postgresql/17/man'
> '--docdir=/usr/share/doc/postgresql-doc-17'
> '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/'
> '--datadir=/usr/share/postgresql/17'
> '--bindir=/usr/lib/postgresql/17/bin'
> '--libdir=/usr/lib/x86_64-linux-gnu/'
> '--libexecdir=/usr/lib/postgresql/'
> '--includedir=/usr/include/postgresql/' '--with-extra-version= (Debian
> 17.4-1.pgdg110+2)' '--enable-nls' '--enable-thread-safety'
> '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld'
> '--with-gssapi' '--with-ldap' '--with-pgport=5432'
> '--with-system-tzdata=/usr/share/zoneinfo' 'AWK=mawk'
> 'MKDIR_P=/bin/mkdir -p' 'PROVE=/usr/bin/prove'
> 'PYTHON=/usr/bin/python3' 'TAR=/bin/tar' 'XSLTPROC=xsltproc --nonet'
> 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat
> -Werror=format-security -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-z,relro
> -Wl,-z,now' '--enable-tap-tests' '--with-icu' '--with-llvm'
> 'LLVM_CONFIG=/usr/bin/llvm-config-16' 'CLANG=/usr/bin/clang-16'
> '--with-lz4' '--with-zstd' '--with-systemd' '--with-selinux'
> '--enable-dtrace' 'build_alias=x86_64-linux-gnu' 'CPPFLAGS=-Wdate-time
> -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fstack-protector-strong
> -Wformat -Werror=format-security'|
>
> |CC |gcc |
>
> |CPPFLAGS |-Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE
> -I/usr/include/libxml2 |
>
> |CFLAGS |-Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3
> -Wcast-function-type -Wshadow=compatible-local -Wformat-security
> -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -g -g -O2
> -fstack-protector-strong -Wformat -Werror=format-security
> -fno-omit-frame-pointer |
>
> |CFLAGS_SL |-fPIC |
>
> |LDFLAGS |-Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-16/lib -Wl,--as-needed |
>
> |LDFLAGS_EX | |
>
> |LDFLAGS_SL | |
>
> |LIBS |-lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam
> -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lpthread -lrt -ldl -lm |
>
> |VERSION |PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2)
>
> select * from pg_database;
>
> |oid |datname
> |datdba|encoding|datlocprovider|datistemplate|datallowconn|dathasloginevt|datconnlimit|datfrozenxid|datminmxid|dattablespace|datcollate
> |datctype |datlocale|daticurules|datcollversion|datacl |
>
> |------|---------|------|--------|--------------|-------------|------------|--------------|------------|------------|----------|-------------|-----------|-----------|---------|-----------|--------------|-----------------------------------|
>
> |5 |postgres |10 |6 |c |false |true |false |-1 |730 |1 |1,663
> |en_US.UTF-8|en_US.UTF-8| | |2.40 | |
>
> |16,388|demo |10 |6 |c |false |true |false |-1 |730 |1 |1,663
> |en_US.UTF-8|en_US.UTF-8| | |2.40 | |
>
> |1 |template1|10 |6 |c |true |true |false |-1 |730 |1 |1,663
> |en_US.UTF-8|en_US.UTF-8| | |2.40 |{=c/postgres,postgres=CTc/postgres}|
>
> |4 |template0|10 |6 |c |true |false |false |-1 |730 |1 |1,663
> |en_US.UTF-8|en_US.UTF-8| | | |{=c/postgres,postgres=CTc/postgres}|
>
> Collations are standard, out-of-the-box:
>
> select collname, collprovider, collencoding, collcollate, collctype,
> colllocale,
>
> collversion, collisdeterministic
>
> FROM pg_collation cll
>
> --where cll.collname in ('C.utf8', 'en_US.utf8', 'ru-RU-x-icu')
>
> |collname |collprovider|collencoding|collcollate|collctype
> |colllocale|collversion|collisdeterministic|
>
> |-----------|------------|------------|-----------|----------|----------|-----------|-------------------|
>
> |default |d |-1 | | | | |true |
>
> |C |c |-1 |C |C | | |true |
>
> |POSIX |c |-1 |POSIX |POSIX | | |true |
>
> |ucs_basic |b |6 | | |C |1 |true |
>
> |pg_c_utf8 |b |6 | | |C.UTF-8 |1 |true |
>
> |unicode |i |-1 | | |und |153.14 |true |
>
> |C.utf8 |c |6 |C.utf8 |C.utf8 | | |true |
>
> |en_US.utf8 |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |
>
> |ru_RU.utf8 |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |
>
> |en_US |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |
>
> |ru_RU |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |
>
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-02-28 14:16:40 | Re: Slow performance of collate "en_US.utf8" |
Previous Message | Alexey Borschev | 2025-02-27 13:54:02 | Slow performance of collate "en_US.utf8" |