Advice needed: query performance deteriorates by 2000% within 1 minute

From: Peter Adlersburg <peter(dot)adlersburg(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Advice needed: query performance deteriorates by 2000% within 1 minute
Date: 2022-02-24 13:53:12
Message-ID: CAD1Uk1ouyBstWpEj+aKYn6F5otKCuJMGkMtNuSrHXDOqgZuCVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear fellow DBAs,

I am seeking for some guidance with the following case that our developers
have thrown at me and I apologize in advance for this lengthy mail ...

$> postgres --version
postgres (PostgreSQL) 13.6

We are dealing with the following issue:

select version, content from orderstore.order
WHERE jsonb_to_tsvector('english', content, '["all"]') @@
websearch_to_tsquery('english', '1.20709841') limit 10 ;

The Devs told me that this query normally finishes within a reasonable
amount of time (<1sec) but every day - and all of a sudden - performance
worsens to execution times > 20sec.

Furthermore I was told:

"When we change the query to 'limit 100' it runs fast again"
"When we execute a 'vacuum orderstore.order' everything becomes good again
- but that only lasts for a few hours"

So I scheduled a little script to be executed every minute which contains 3
explains.

1 query with limit 10
1 query with limit 100
1 query with the limit-clause omitted

And here's a quick grep of the result after a few hours:

...

Execution Time: 1.413 ms <= limit 10
Execution Time: 0.389 ms <= limit 100
Execution Time: 0.297 ms <= limit clause omitted
Execution Time: 1.456 ms
Execution Time: 0.396 ms
Execution Time: 0.302 ms
Execution Time: 1.412 ms
Execution Time: 0.428 ms
Execution Time: 0.255 ms
Execution Time: 1.404 ms
Execution Time: 0.403 ms
Execution Time: 0.258 ms
Execution Time: 25588.448 ms <= limit 10
Execution Time: 0.919 ms <= limit 100
Execution Time: 0.453 ms <= limit clause omitted
Execution Time: 25657.524 ms
Execution Time: 0.965 ms
Execution Time: 0.452 ms
Execution Time: 25843.139 ms
Execution Time: 0.959 ms
Execution Time: 0.446 ms
Execution Time: 25631.389 ms
Execution Time: 0.946 ms
Execution Time: 0.447 ms
Execution Time: 25452.764 ms
Execution Time: 0.937 ms
Execution Time: 0.444 ms
<= here I manually vacuumed the table
Execution Time: 0.071 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.072 ms
Execution Time: 0.023 ms
Execution Time: 0.017 ms
Execution Time: 0.064 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.063 ms
Execution Time: 0.020 ms
Execution Time: 0.015 ms

...

Turned out the devs were right with their complaints.

The execution plan changed within one minute from using an index to a
sequential scan;

Here are the details (In the "LOG:"-line I select the current timestamp and
the row count of the table):

****************************
*** the last 'good' run: ***
****************************

LOG: | 2022-02-24 13:47:01.747416+01 | 9653

LIMIT 10:

Limit (cost=752.37..789.30 rows=10 width=22) (actual time=1.388..1.390
rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22)
(actual time=1.387..1.388 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36
rows=48 width=0) (actual time=1.374..1.374 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig,
content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.401 ms
Execution Time: 1.404 ms

LIMIT 100:

Limit (cost=752.37..929.63 rows=48 width=22) (actual time=0.391..0.391
rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22)
(actual time=0.390..0.391 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36
rows=48 width=0) (actual time=0.387..0.387 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig,
content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.136 ms
Execution Time: 0.403 ms

NO LIMIT:

Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22)
(actual time=0.248..0.249 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36
rows=48 width=0) (actual time=0.245..0.245 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.107 ms
Execution Time: 0.258 ms

*********************************************
*** the first 'bad' run (one minute later ***
*********************************************

LOG: | 2022-02-24 13:48:01.840362+01 | 9653

LIMIT 10:

Limit (cost=0.00..804.97 rows=10 width=22) (actual
time=23970.845..25588.432 rows=1 loops=1)
-> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual
time=23970.843..25588.429 rows=1 loops=1)
Filter: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Rows Removed by Filter: 9652
Planning Time: 0.430 ms
Execution Time: 25588.448 ms

LIMIT 100:

Limit (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.902
rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22)
(actual time=0.900..0.901 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36
rows=48 width=0) (actual time=0.894..0.895 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig,
content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.187 ms
Execution Time: 0.919 ms

NO LIMIT:

Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22)
(actual time=0.442..0.442 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36
rows=48 width=0) (actual time=0.438..0.438 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.151 ms
Execution Time: 0.453 ms

The table in question isn't that big:

oid | table_schema | table_name | row_estimate | total | index |
toast | table
--------+--------------+------------+--------------+--------+--------+-------+-------
155544 | orderstore | order | 9649 | 210 MB | 108 MB | 91
MB | 10 MB

Table DDL:

CREATE TABLE orderstore."order" (
pk_id bigint DEFAULT nextval('orderstore.order_pk_id_seq'::regclass)
NOT NULL,
version integer NOT NULL,
content jsonb NOT NULL,
manipulation_history jsonb NOT NULL,
CONSTRAINT chk_external_id_not_null CHECK (((content ->>
'externalId'::text) IS NOT NULL)),
CONSTRAINT chk_id_not_null CHECK (((content ->> 'id'::text) IS NOT
NULL))
);

DDL of the index used (one amongst many others that exist):

--
-- Name: idx_fulltext_content; Type: INDEX; Schema: orderstore; Owner:
orderstore
--

CREATE INDEX idx_fulltext_content ON orderstore."order" USING gin
(jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb));

The record in pg_stat_all_tables before the manual vacuum:

relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24696
idx_tup_fetch | 1155483
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9653
n_dead_tup | 87
n_mod_since_analyze | 152
n_ins_since_vacuum | 4
last_vacuum | 2022-02-24 10:44:34.524241+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 3
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0

The entry in pg_stat_all_tables after the manual vacuum:

relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24753
idx_tup_fetch | 1155561
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9476
n_dead_tup | 0
n_mod_since_analyze | 152
n_ins_since_vacuum | 0
last_vacuum | 2022-02-24 14:32:16.083692+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 4
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0

Can someone provide any hints on how to deal with this issue? What am I
missing?

In case you need additional informations pls let me know.

kind regards,

peter

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2022-02-24 15:05:50 Re: Advice needed: query performance deteriorates by 2000% within 1 minute
Previous Message Ranier Vilela 2022-02-24 13:46:23 Re: An I/O error occurred while sending to the backend (PG 13.4)