Performance degradation in Index searches with special characters

From: Andrey Stikheev <andrey(dot)stikheev(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Performance degradation in Index searches with special characters
Date: 2024-10-05 09:23:56
Message-ID: CALM5VP9ePykYMnz+0xf_+OKjV_raeHNWzerCBwE5ykRSCcd+AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear PostgreSQL Community,

I am facing significant performance issues when executing queries that
involve string comparisons with special characters, such as <, #, !, @,
etc., especially when dealing with long strings. The query execution time
increases drastically when these characters are used, whereas queries with
alphabetic characters do not show such degradation. This behavior is
observed both on macOS (using the official postgres:17 image via Docker)
and on an Ubuntu 20.04 server running PostgreSQL in an LXC container.

Here is a minimal example:

testdb=# SELECT version();

version

---------------------------------------------------------------------------------------------------------------------------

PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

(1 row)

testdb=# CREATE TABLE test (value VARCHAR(10) NOT NULL);
CREATE TABLE
Time: 3.562 ms

testdb=# CREATE INDEX idx_test ON test (value);
CREATE INDEX
Time: 3.080 ms

testdb=# INSERT INTO test (value) VALUES ('<');
INSERT 0 1
Time: 3.365 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536);
?column?
----------
(0 rows)
Time: 4454.535 ms (00:04.455)

testdb=# SELECT 1 FROM test WHERE value = repeat('a', 65536);
?column?
----------
(0 rows)
Time: 3.772 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) || 'a';
?column?
----------
(0 rows)
Time: 4.352 ms

Time: 9.503 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) COLLATE "C";

?column?

----------

(0 rows)

Time: 3.299 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 8192);

?column?

----------

(0 rows)

Time: 77.171 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 16384);

?column?

----------

(0 rows)

Time: 325.190 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 32768);

?column?

----------

(0 rows)

Time: 1154.850 ms (00:01.155)

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 65536);

?column?

----------

(0 rows)

Time: 4490.206 ms (00:04.490)

testdb=# explain (analyze, verbose, buffers, costs, settings, timing, wal)
SELECT 1 FROM test WHERE value = repeat('<', 65000);

---------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=4.20..13.67 rows=6 width=4) (actual
time=4425.459..4425.459 rows=0 loops=1)
Output: 1
Recheck Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<< ... a
lot symbols ...

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
Buffers: shared hit=1
-> Bitmap Index Scan on idx_test (cost=0.00..4.20 rows=6 width=0)
(actual time=4425.432..4425.432 rows=0 loops=1)
Index Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<<
... a lot symbols ...<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
Buffers: shared hit=1
Planning Time: 1.082 ms
Execution Time: 4425.602 ms
(9 rows)

Time: 4433.001 ms (00:04.433)

Observations*:*

- The performance degradation occurs with certain special characters
like < , !, >, @ , #, ... .
- Queries using alphabetic characters or appending/prepending characters
execute much faster.
- The execution time increases exponentially with the length of the
string composed of special characters.
- Changing the collation to 'C' in the query significantly improves
performance.

Questions*:*

- Is this performance degradation expected due to collation handling of
certain special characters in PostgreSQL?
- Are there any recommendations to improve performance without changing
the column or database collation?

--
Best regards,
Andrey Stikheev

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2024-10-06 16:39:38 Re: Performance degradation in Index searches with special characters
Previous Message Pavel Stehule 2024-09-22 08:43:22 Re: proposal: schema variables