From: | Andrey Stikheev <andrey(dot)stikheev(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Performance degradation in Index searches with special characters |
Date: | 2024-10-05 09:23:56 |
Message-ID: | CALM5VP9ePykYMnz+0xf_+OKjV_raeHNWzerCBwE5ykRSCcd+AA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Dear PostgreSQL Community,
I am facing significant performance issues when executing queries that
involve string comparisons with special characters, such as <, #, !, @,
etc., especially when dealing with long strings. The query execution time
increases drastically when these characters are used, whereas queries with
alphabetic characters do not show such degradation. This behavior is
observed both on macOS (using the official postgres:17 image via Docker)
and on an Ubuntu 20.04 server running PostgreSQL in an LXC container.
Here is a minimal example:
testdb=# SELECT version();
version
---------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)
testdb=# CREATE TABLE test (value VARCHAR(10) NOT NULL);
CREATE TABLE
Time: 3.562 ms
testdb=# CREATE INDEX idx_test ON test (value);
CREATE INDEX
Time: 3.080 ms
testdb=# INSERT INTO test (value) VALUES ('<');
INSERT 0 1
Time: 3.365 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536);
?column?
----------
(0 rows)
Time: 4454.535 ms (00:04.455)
testdb=# SELECT 1 FROM test WHERE value = repeat('a', 65536);
?column?
----------
(0 rows)
Time: 3.772 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) || 'a';
?column?
----------
(0 rows)
Time: 4.352 ms
Time: 9.503 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) COLLATE "C";
?column?
----------
(0 rows)
Time: 3.299 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('@', 8192);
?column?
----------
(0 rows)
Time: 77.171 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('@', 16384);
?column?
----------
(0 rows)
Time: 325.190 ms
testdb=# SELECT 1 FROM test WHERE value = repeat('@', 32768);
?column?
----------
(0 rows)
Time: 1154.850 ms (00:01.155)
testdb=# SELECT 1 FROM test WHERE value = repeat('@', 65536);
?column?
----------
(0 rows)
Time: 4490.206 ms (00:04.490)
testdb=# explain (analyze, verbose, buffers, costs, settings, timing, wal)
SELECT 1 FROM test WHERE value = repeat('<', 65000);
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=4.20..13.67 rows=6 width=4) (actual
time=4425.459..4425.459 rows=0 loops=1)
Output: 1
Recheck Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<< ... a
lot symbols ...
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
Buffers: shared hit=1
-> Bitmap Index Scan on idx_test (cost=0.00..4.20 rows=6 width=0)
(actual time=4425.432..4425.432 rows=0 loops=1)
Index Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<<
... a lot symbols ...<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
Buffers: shared hit=1
Planning Time: 1.082 ms
Execution Time: 4425.602 ms
(9 rows)
Time: 4433.001 ms (00:04.433)
Observations*:*
- The performance degradation occurs with certain special characters
like < , !, >, @ , #, ... .
- Queries using alphabetic characters or appending/prepending characters
execute much faster.
- The execution time increases exponentially with the length of the
string composed of special characters.
- Changing the collation to 'C' in the query significantly improves
performance.
Questions*:*
- Is this performance degradation expected due to collation handling of
certain special characters in PostgreSQL?
- Are there any recommendations to improve performance without changing
the column or database collation?
--
Best regards,
Andrey Stikheev
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-10-06 16:39:38 | Re: Performance degradation in Index searches with special characters |
Previous Message | Pavel Stehule | 2024-09-22 08:43:22 | Re: proposal: schema variables |