Re: Performance degradation in Index searches with special characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrey Stikheev <andrey(dot)stikheev(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Performance degradation in Index searches with special characters
Date: 2024-10-06 16:39:38
Message-ID: 1353533.1728232778@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andrey Stikheev <andrey(dot)stikheev(at)gmail(dot)com> writes:
> - Changing the collation to 'C' in the query significantly improves
> performance.

What collation are you using, pray tell? (And what database encoding?)

> - Is this performance degradation expected due to collation handling of
> certain special characters in PostgreSQL?

It seems like a performance bug indeed, but not ours --- I'm thinking
it must be in strcoll() or ICU.

Trying it here (on a RHEL8 machine) with en_US.utf8 locale, I see
something similar but not quite as bad:

u8=# SELECT 1 FROM test WHERE value = repeat('<', 65536);
?column?
----------
(0 rows)

Time: 1383.033 ms (00:01.383)

Poking into it with gdb says that the time is indeed spent inside
strcoll:

#0 get_next_seq (pass=1, indirect=0x7fabb4988ea8, extra=0x7fabb4984900 "",
table=0x7fabb490e2b0, weights=<optimized out>,
rulesets=0x7fabb490e2a8 "\001\002\001\005\001\001\001\005", nrules=4,
seq=<synthetic pointer>) at strcoll_l.c:111
#1 __GI___strcoll_l (s1=0x1785878 "<",
s2=0x178587a '<' <repeats 200 times>..., l=<optimized out>)
at strcoll_l.c:338
#2 0x00000000009527a6 in strncoll_libc (arg1=<optimized out>, len1=1,
arg2=<optimized out>, len2=65536, locale=<optimized out>,
locale=<optimized out>) at pg_locale.c:1964
#3 0x00000000009ac760 in varstr_cmp (arg1=0x7fabc2dcbfe9 "<", len1=1,
arg2=0x17958cc '<' <repeats 200 times>..., len2=65536,
collid=<optimized out>) at varlena.c:1567
#4 0x00000000009acfe3 in bttextcmp (fcinfo=0x7ffddd3b0590) at varlena.c:1820
#5 0x00000000009d75fa in FunctionCall2Coll (
flinfo=flinfo(at)entry=0x7ffddd3b10e8, collation=<optimized out>,
arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:1161
#6 0x0000000000594948 in _bt_compare (rel=0x7fabcde7eed0, key=0x7ffddd3b10c0,
page=<optimized out>, offnum=<optimized out>) at nbtsearch.c:762
#7 0x0000000000594e32 in _bt_binsrch (rel=rel(at)entry=0x7fabcde7eed0,
key=key(at)entry=0x7ffddd3b10c0, buf=<optimized out>) at nbtsearch.c:394

It's not the fault of the index machinery, because a single comparison
takes the same amount of time:

u8=# select '<' <= repeat('<', 65536);
?column?
----------
t
(1 row)

Time: 1391.550 ms (00:01.392)

I didn't try it with ICU.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrey Stikheev 2024-10-06 17:28:29 Re: Performance degradation in Index searches with special characters
Previous Message Andrey Stikheev 2024-10-05 09:23:56 Performance degradation in Index searches with special characters