Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

From: Jan Kort <jan(dot)kort(at)genetics(dot)nl>
To: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Date: 2021-05-18 12:46:24
Message-ID: AM0PR0502MB3620D231418BE0B7B8A9E3E09B2C9@AM0PR0502MB3620.eurprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I tried this on both PostgreSQL 12.7 and 13.3, I tried both VACUUM ANALYZE and VACUUM FULL ANALYZE, they both seem to change a straightforward plan to a reversed and complex plan that make the query slow. The only thing that works is to do a pg_dump, pg_restore, but that is not feasible.

It seems to depend on specific data, I tried generating other variations of around 1M records, but so far the attached DB (also generated) is the only one where I could reproduce it, I can’t figure out what is so special about this data.

Given the following record counts:

GFO_ZAKEN: 1048587
GFO_ZAKEN_KOSTEN: 3
GFO_ZAKEN_TYPECODE: 4
GFO_ZAKEN.ZAAKTYPECODE_ID has a value 9 times and the rest is null

When I run this query:

EXPLAIN ANALYZE
SELECT GFO_ZAKEN_KOSTEN.ID AS GFO_ZAKEN_KOSTEN_ID,
GFO_ZAKEN.ID AS GFO_ZAKEN_ID,
GFO_ZAKEN_TYPECODE.OMSCHRIJVING AS ZAAKTYPECODE_ID
FROM TRIAL.GFO_ZAKEN_KOSTEN
JOIN TRIAL.GFO_ZAKEN ON GFO_ZAKEN_KOSTEN.GFO_ZAKEN_ID = GFO_ZAKEN.ID
JOIN TRIAL.GFO_ZAKEN_TYPECODE ON GFO_ZAKEN.ZAAKTYPECODE_ID = GFO_ZAKEN_TYPECODE.ID

It is taking 500ms or so, which I associate with a full table scan, but they are just simple referential constraints and corresponding indexes:

CREATE TABLE TRIAL.GFO_ZAKEN_TYPECODE (ID INTEGER PRIMARY KEY, OMSCHRIJVING CHARACTER VARYING(4000));
CREATE TABLE TRIAL.GFO_ZAKEN (ID INTEGER PRIMARY KEY, ZAAKTYPECODE_ID INTEGER,
CONSTRAINT ZAAKTYPECODE_IDC1 FOREIGN KEY (ZAAKTYPECODE_ID) REFERENCES TRIAL.GFO_ZAKEN_TYPECODE (ID));
CREATE INDEX GFO_ZAKENO18 ON TRIAL.GFO_ZAKEN USING BTREE (ZAAKTYPECODE_ID);
CREATE TABLE TRIAL.GFO_ZAKEN_KOSTEN (ID INTEGER PRIMARY KEY, GFO_ZAKEN_ID INTEGER,
CONSTRAINT GFO_ZAKEN_IDC1 FOREIGN KEY (GFO_ZAKEN_ID) REFERENCES TRIAL.GFO_ZAKEN (ID));
CREATE INDEX GFO_ZAKEN_KOSTENO14 ON TRIAL.GFO_ZAKEN_KOSTEN USING BTREE (GFO_ZAKEN_ID ASC NULLS LAST);

After pg_restore it gives a straightforward plan, starting with the gfo_zaken_kosten primary key and continuing with the join on gfo_zaken_kosten.gfo_zaken_id:

Nested Loop (cost=0.56..3.64 rows=1 width=524) (actual time=0.036..0.037 rows=1 loops=1)
-> Nested Loop (cost=0.43..3.48 rows=1 width=12) (actual time=0.030..0.030 rows=1 loops=1)
-> Seq Scan on gfo_zaken_kosten (cost=0.00..1.04 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=1)
Filter: (id = 13)
Rows Removed by Filter: 2
-> Index Scan using gfo_zakenp on gfo_zaken (cost=0.43..2.45 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)
Index Cond: (id = gfo_zaken_kosten.gfo_zaken_id)
-> Index Scan using gfo_zaken_typecodep on gfo_zaken_typecode (cost=0.13..0.15 rows=1 width=520) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = gfo_zaken.zaaktypecode_id)
Planning Time: 1.538 ms
Execution Time: 0.095 ms

After VACUUM ANALYZE the plan becomes inefficient again, and does not start with the gfo_zaken_kosten primary key, the plan starts at the wrong end with an index scan on 1M rows:

Merge Join (cost=1.48..1.59 rows=1 width=159) (actual time=619.374..619.376 rows=1 loops=1)
Merge Cond: (gfo_zaken.id = gfo_zaken_kosten.gfo_zaken_id)
-> Nested Loop (cost=0.43..96503.47 rows=1048587 width=155) (actual time=0.022..619.359 rows=9 loops=1)
Join Filter: (gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id)
Rows Removed by Join Filter: 4194316
-> Index Scan using gfo_zakenp on gfo_zaken (cost=0.43..33587.23 rows=1048587 width=8) (actual time=0.006..141.167 rows=1048587 loops=1)
-> Materialize (cost=0.00..1.06 rows=4 width=155) (actual time=0.000..0.000 rows=4 loops=1048587)
-> Seq Scan on gfo_zaken_typecode (cost=0.00..1.04 rows=4 width=155) (actual time=0.011..0.012 rows=4 loops=1)
-> Sort (cost=1.05..1.05 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)
Sort Key: gfo_zaken_kosten.gfo_zaken_id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on gfo_zaken_kosten (cost=0.00..1.04 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)
Filter: (id = 13)
Rows Removed by Filter: 2
Planning Time: 69.151 ms
Execution Time: 619.410 ms

So 6520 times slower after vacuum analyze then before.

Doing VACUUM ANALYZE again doesn’t make it better, only pg_dump + pg_restore will go back to the original plan, but then it will break again on the first VACUUM ANALYZE.

I attached a 800K test DB with all sensitive data removed.

I tried both the default config without changes, and the default config with our settings appended:

max_connections = 1000
shared_buffers = 512MB
effective_cache_size = 6GB
work_mem = 10485kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1
wal_sync_method = open_datasync
fsync = on
synchronous_commit = off

Doing a VACUUM ANALYZE shouldn’t change a straightforward plan.

Regards,

Jan Kort

Attachment Content-Type Size
trialdb.zip application/x-zip-compressed 807.0 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-05-18 13:26:24 BUG #17017: Two versions of the same row of records are returned in one query
Previous Message PG Bug reporting form 2021-05-18 07:42:03 BUG #17016: Cannot sync pgdg-common repo with reposync due to failed signature check