Re: Terribly slow query with very good plan?

From: Les <nagylzs(at)gmail(dot)com>
To: Nick Cleaton <nick(at)cleaton(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Terribly slow query with very good plan?
Date: 2022-02-04 12:27:28
Message-ID: CAKXe9UCDc=rM7GTrb08oboD9qttEfuo_8hygLEMsjhS5wOLVcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Nick Cleaton <nick(at)cleaton(dot)net> ezt írta (időpont: 2022. febr. 4., P,
11:57):

>
> With the ^@ operator, my guess is that because the planner knows
> nothing about the folder name value it could be the empty string,
> which would be a prefix of everything.
>

I think I could narrow down the problem to the simplest query possible.

The "title could be empty" does not hold for this:

CREATE index test ON media.oo_file (relpath COLLATE "C");

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active and fi.relpath ^@ 'Természettudomány' limit 1

Limit (cost=0.00..2.70 rows=1 width=8) (actual time=14445.559..14445.561
rows=0 loops=1)
Output: id
Buffers: shared hit=22288 read=108975
-> Seq Scan on media.oo_file fi (cost=0.00..144710.65 rows=53574
width=8) (actual time=14445.555..14445.556 rows=0 loops=1)
Output: id
Filter: (fi.is_active AND (fi.relpath ^@ 'Természettudomány'::text))
Rows Removed by Filter: 1075812
Buffers: shared hit=22288 read=108975
Planning Time: 0.398 ms
Execution Time: 14445.593 ms

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active and fi.relpath like 'Természettudomány%' limit 1

Limit (cost=0.00..2.70 rows=1 width=8) (actual time=11222.280..11222.282
rows=0 loops=1)
Output: id
Buffers: shared hit=22320 read=108943
-> Seq Scan on media.oo_file fi (cost=0.00..144710.65 rows=53574
width=8) (actual time=11222.278..11222.279 rows=0 loops=1)
Output: id
Filter: (fi.is_active AND (fi.relpath ~~
'Természettudomány%'::text))
Rows Removed by Filter: 1075812
Buffers: shared hit=22320 read=108943
Planning Time: 0.488 ms
Execution Time: 11222.307 ms

It is using seq scan for both cases. This is definitely wrong!

One of my collaguage has noticed that the LIKE query uses index scan for
some of the letters:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active and fi.relpath like 'A%' limit 1;
Limit (cost=0.55..60.85 rows=1 width=8) (actual time=6.508..6.509 rows=0
loops=1)
Output: id
Buffers: shared hit=2776
-> Index Scan using test on media.oo_file fi (cost=0.55..4583.29
rows=76 width=8) (actual time=6.506..6.507 rows=0 loops=1)
Output: id
Index Cond: ((fi.relpath >= 'A'::text) AND (fi.relpath < 'B'::text))
Filter: (fi.is_active AND (fi.relpath ~~ 'A%'::text))
Rows Removed by Filter: 3784
Buffers: shared hit=2776
Planning Time: 0.543 ms
Execution Time: 6.560 ms

Actually, the number of files per starting letter is:

select substr(relpath, 0, 2), count(*)
from media.oo_file
group by substr(relpath, 0, 2)
order by count(*) desc
substr|count |
------+------+
O |386087|
F |236752|
N |167171|
Ó |111479|
T |109786|
M | 34348|
P | 19878|
L | 5657|
A | 3784|
I | 869|
C | 1|

PostgreSQL uses seq scan for O, F, N, T letters, but it uses index scan for
A, I, C letters (with the "like" query).

There might be a problem with the planner here, because I think that using
an index scan will always be faster than a seq scan. The number of rows for
the prefix should not matter at all, because we are trying to get the first
matching row only. For some reason it decides between seq/index scan based
on the number of rows stored in some stats. At least it seems that way.

If I could tell the planner to use the index, I think my problem would be
solved. Is there a way to put optimizer hints into the query?

There could be some improvement made to the @^ operator too, because it
always uses seq scan, no matter what.

What do you think?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nick Cleaton 2022-02-04 12:59:21 Re: Terribly slow query with very good plan?
Previous Message Nick Cleaton 2022-02-04 10:57:13 Re: Terribly slow query with very good plan?