Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jan Kort <jan(dot)kort(at)genetics(dot)nl>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Date: 2021-05-18 16:54:51
Message-ID: CAFj8pRC9BoX1C45n0jFEYmCAwBj5nQoiy1_iukA4S5-kS9yM+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

út 18. 5. 2021 v 18:22 odesílatel Jan Kort <jan(dot)kort(at)genetics(dot)nl> napsal:

> Hi Pavel,
>
>
>
> Thank you for the reply.
>
>
>
> The query is normally efficient.
>
>
>
> I have removed more of the clutter, for example the indexes aren't
> important in this specific case:
>
>
>
> DROP INDEX TRIAL.GFO_ZAKENO18;
>
> DROP INDEX TRIAL.GFO_ZAKEN_KOSTENO14;
>
>
>
> The GFO_ZAKEN_TYPECODE part isn't important either, I can reproduce it
> with just:
>
>
>
> EXPLAIN ANALYZE SELECT *
>
> FROM TRIAL.GFO_ZAKEN_KOSTEN
>
> JOIN TRIAL.GFO_ZAKEN ON GFO_ZAKEN_KOSTEN.GFO_ZAKEN_ID = GFO_ZAKEN.ID
>
> WHERE GFO_ZAKEN_KOSTEN.ID = 13
>
>
>
> So it should always pick the GFO_ZAKEN_KOSTEN.ID = 13 and follow the
> referential constraint in the JOIN now.
>
>
>
> And in most cases it will do this.
>
>
>
> But, not in the specific case I have with just 3 GFO_ZAKEN_KOSTEN records.
>
>
>
> The slightest change in the data will cause the query to behave normal,
> the planner says 0.050ms again.
>
>
>
> For example if I add 1 record with:
>
>
>
> insert into trial.gfo_zaken_kosten (id, gfo_zaken_id) values (1000, 1000);
>
>
>
> So you now have GFO_ZAKEN_KOSTEN:
>
>
>
> ID GFO_ZAKEN_ID
>
> 11 98
>
> 12 98
>
> 13 1049601
>
> 1000 1000
>
>
>
> After a VACUUM FULL ANALYZE the query goes back to normal.
>
>
>
> Nested Loop (cost=0.43..9.50 rows=1 width=16) (actual time=0.038..0.039
> rows=1 loops=1)
>
> -> Seq Scan on gfo_zaken_kosten (cost=0.00..1.05 rows=1 width=8)
> (actual time=0.033..0.034 rows=1 loops=1)
>
> Filter: (id = 13)
>
> Rows Removed by Filter: 3
>
> -> Index Scan using gfo_zaken_pkey on gfo_zaken (cost=0.43..8.45
> rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
>
> Index Cond: (id = gfo_zaken_kosten.gfo_zaken_id)
>
> Planning Time: 0.138 ms
>
> Execution Time: 0.050 ms
>
>
>
> After removing the GFO_ZAKEN_KOSTEN with ID = 1000 again and running
> VACUUM FULL ANALYZE again it becomes slow again. Just to make sure I
> switched between the 3 and 4 records a few times and could reproduce this
> every time. I also tried different values for ID and GFO_ZAKEN_ID and more
> than 1 record, but anything 4 or higher worked, even 1M records in
> GFO_ZAKEN_KOSTEN was just as efficient.
>
>
>
> Then I removed GFO_ZAKEN_KOSTEN record with ID = 11, giving 3 records,
> that fixed it too, so it isn't just 3 records that is the problem.
>
>
>
> I put back the GFO_ZAKEN_KOSTEN record with ID = 11, giving 4 records,
> that worked.
>
>
>
> Then I removed GFO_ZAKEN_KOSTEN record with ID = 1000, giving the original
> 3 records, it stopped working.
>
>
>
> Then I removed GFO_ZAKEN_KOSTEN record with ID = 12, giving 2 records,
> that worked.
>
>
>
> I put back GFO_ZAKEN_KOSTEN record with ID = 12, giving 3 records, that
> stopped working.
>
>
>
> Then I removed GFO_ZAKEN_KOSTEN record with ID = 11, giving 2 records,
> that worked.
>
>
>
> So my guess is that it is 3 records, where 2 have the same reference that
> causes the planner to chose a different route.
>

I have no idea, but maybe the sampling in ANALYZE doesn't work well when
there is very low cardinality.

You can check if related column statistics have valid value or not.

https://pages.saclay.inria.fr/maxime.buron/posts/checking-postgres-statistics.html

Regards

Pavel

>
>
>
> Met vriendelijke groet,
>
>
>
> *Jan Kort | **Helpdesk Medewerker*
>
> Afdeling R&D, Projecten en Helpdesk
>
> [image: cid:image006(dot)png(at)01D716A1(dot)3ACBF5F0]
>
>
>
> *E* jan(dot)kort(at)genetics(dot)nl
> *I* www.genetics.nl
> *T* (+31) 36 54 00 850 <+31365400850>
> *M* (+31) 6 485 138 32 <+31648513832>
> *A* Postbus 1268, 1300 BG Almere
> *B* Bouwmeesterweg 8, 1333 LC Almere
>
>
> [image: linkedin] <https://www.linkedin.com/company/genetics>[image:
> twitter] <https://twitter.com/geneticsbv>[image: facebook]
> <https://www.facebook.com/geneticsNL>
> ------------------------------
>
> *De informatie in dit bericht kan vertrouwelijk zijn. Zij is uitsluitend
> bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt,
> wordt u verzocht de inhoud niet te gebruiken en de afzender direct te
> informeren door dit bericht te retourneren. Genetics is niet aansprakelijk
> voor de overdracht van de inhoud van dit e-mailbericht. Evenmin is Genetics
> aansprakelijk voor eventuele vertragingen. *
> ------------------------------
>
> *Op alle overeenkomsten zijn onze algemene voorwaarden van Nederland ICT
> van toepassing. Deze zijn gedeponeerd bij de Kamer van Koophandel onder
> nummer 30174840, tenzij anders schriftelijk overeengekomen.*
> ------------------------------
>
> *Genetics hecht veel waarde aan de kwaliteit en veiligheid van haar
> dienstverlening. Om dit proces te borgen is Genetics door Kiwa
> gecertificeerd voor ISO 9001, ISO/IEC 27001 en de AVG.*
> ------------------------------
>
>
>
> *P **Denk a.u.b. aan het milieu voordat u deze e-mail uitprint*
>
>
>
> *Van:* Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> *Verzonden:* dinsdag 18 mei 2021 16:02
> *Aan:* Jan Kort <jan(dot)kort(at)genetics(dot)nl>
> *CC:* pgsql-bugs(at)lists(dot)postgresql(dot)org
> *Onderwerp:* Re: Query with straightforward plan changes and becomes 6520
> times slower after VACUUM ANALYZE
>
>
>
> Hi
>
>
>
> út 18. 5. 2021 v 15:36 odesílatel Jan Kort <jan(dot)kort(at)genetics(dot)nl> napsal:
>
> Hi,
>
>
>
> I tried this on both PostgreSQL 12.7 and 13.3, I tried both VACUUM ANALYZE
> and VACUUM FULL ANALYZE, they both seem to change a straightforward plan
> to a reversed and complex plan that make the query slow. The only thing
> that works is to do a pg_dump, pg_restore, but that is not feasible.
>
>
>
> It seems to depend on specific data, I tried generating other variations
> of around 1M records, but so far the attached DB (also generated) is the
> only one where I could reproduce it, I can’t figure out what is so special
> about this data.
>
>
>
> Given the following record counts:
>
>
>
> GFO_ZAKEN: 1048587
>
> GFO_ZAKEN_KOSTEN: 3
>
> GFO_ZAKEN_TYPECODE: 4
>
> GFO_ZAKEN.ZAAKTYPECODE_ID has a value 9 times and the rest is null
>
>
>
> When I run this query:
>
>
>
> EXPLAIN ANALYZE
>
> *SELECT* GFO_ZAKEN_KOSTEN*.*ID *AS* GFO_ZAKEN_KOSTEN_ID*,*
>
> GFO_ZAKEN*.*ID *AS* GFO_ZAKEN_ID*,*
>
> GFO_ZAKEN_TYPECODE*.*OMSCHRIJVING *AS* ZAAKTYPECODE_ID
>
> *FROM* TRIAL*.*GFO_ZAKEN_KOSTEN
>
> *JOIN* TRIAL*.*GFO_ZAKEN *ON* GFO_ZAKEN_KOSTEN*.*GFO_ZAKEN_ID *=*
> GFO_ZAKEN*.*ID
>
> *JOIN* TRIAL*.*GFO_ZAKEN_TYPECODE *ON* GFO_ZAKEN*.*ZAAKTYPECODE_ID *=*
> GFO_ZAKEN_TYPECODE*.*ID
>
>
>
> It is taking 500ms or so, which I associate with a full table scan, but
> they are just simple referential constraints and corresponding indexes:
>
>
>
> *CREATE* *TABLE* TRIAL*.*GFO_ZAKEN_TYPECODE *(*ID *INTEGER* *PRIMARY*
> *KEY**,* OMSCHRIJVING *CHARACTER* *VARYING**(*4000*));*
>
> *CREATE* *TABLE* TRIAL*.*GFO_ZAKEN *(*ID *INTEGER* *PRIMARY* *KEY**,*
> ZAAKTYPECODE_ID *INTEGER**,*
>
> *CONSTRAINT* ZAAKTYPECODE_IDC1 *FOREIGN* *KEY* *(*ZAAKTYPECODE_ID*)*
> *REFERENCES* TRIAL*.*GFO_ZAKEN_TYPECODE *(*ID*));*
>
> *CREATE* *INDEX* GFO_ZAKENO18 *ON* TRIAL*.*GFO_ZAKEN *USING* BTREE *(*
> ZAAKTYPECODE_ID*);*
>
> *CREATE* *TABLE* TRIAL*.*GFO_ZAKEN_KOSTEN *(*ID *INTEGER* *PRIMARY* *KEY*
> *,* GFO_ZAKEN_ID *INTEGER**,*
>
> *CONSTRAINT* GFO_ZAKEN_IDC1 *FOREIGN* *KEY* *(*GFO_ZAKEN_ID*)*
> *REFERENCES* TRIAL*.*GFO_ZAKEN *(*ID*));*
>
> *CREATE* *INDEX* GFO_ZAKEN_KOSTENO14 *ON* TRIAL*.*GFO_ZAKEN_KOSTEN *USING*
> BTREE *(*GFO_ZAKEN_ID *ASC* NULLS *LAST**);*
>
>
>
> After pg_restore it gives a straightforward plan, starting with the
> gfo_zaken_kosten primary key and continuing with the join on
> gfo_zaken_kosten.gfo_zaken_id:
>
>
>
> Nested Loop (cost=0.56..3.64 rows=1 width=524) (actual time=0.036..0.037
> rows=1 loops=1)
>
> -> Nested Loop (cost=0.43..3.48 rows=1 width=12) (actual
> time=0.030..0.030 rows=1 loops=1)
>
> -> Seq Scan on gfo_zaken_kosten (cost=0.00..1.04 rows=1 width=8)
> (actual time=0.016..0.016 rows=1 loops=1)
>
> Filter: (id = 13)
>
> Rows Removed by Filter: 2
>
> -> Index Scan using gfo_zakenp on gfo_zaken (cost=0.43..2.45
> rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)
>
> Index Cond: (id = gfo_zaken_kosten.gfo_zaken_id)
>
> -> Index Scan using gfo_zaken_typecodep on gfo_zaken_typecode
> (cost=0.13..0.15 rows=1 width=520) (actual time=0.005..0.005 rows=1 loops=1)
>
> Index Cond: (id = gfo_zaken.zaaktypecode_id)
>
> Planning Time: 1.538 ms
>
> Execution Time: 0.095 ms
>
>
>
> After VACUUM ANALYZE the plan becomes inefficient again, and does not
> start with the gfo_zaken_kosten primary key, the plan starts at the wrong
> end with an index scan on 1M rows:
>
>
>
> Merge Join (cost=1.48..1.59 rows=1 width=159) (actual
> time=619.374..619.376 rows=1 loops=1)
>
> Merge Cond: (gfo_zaken.id = gfo_zaken_kosten.gfo_zaken_id)
>
> -> Nested Loop (cost=0.43..96503.47 rows=1048587 width=155) (actual
> time=0.022..619.359 rows=9 loops=1)
>
> Join Filter: (gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id)
>
> Rows Removed by Join Filter: 4194316
>
> -> Index Scan using gfo_zakenp on gfo_zaken (cost=0.43..33587.23
> rows=1048587 width=8) (actual time=0.006..141.167 rows=1048587 loops=1)
>
> -> Materialize (cost=0.00..1.06 rows=4 width=155) (actual
> time=0.000..0.000 rows=4 loops=1048587)
>
> -> Seq Scan on gfo_zaken_typecode (cost=0.00..1.04 rows=4
> width=155) (actual time=0.011..0.012 rows=4 loops=1)
>
> -> Sort (cost=1.05..1.05 rows=1 width=8) (actual time=0.012..0.012
> rows=1 loops=1)
>
> Sort Key: gfo_zaken_kosten.gfo_zaken_id
>
> Sort Method: quicksort Memory: 25kB
>
> -> Seq Scan on gfo_zaken_kosten (cost=0.00..1.04 rows=1 width=8)
> (actual time=0.008..0.008 rows=1 loops=1)
>
> Filter: (id = 13)
>
> Rows Removed by Filter: 2
>
> Planning Time: 69.151 ms
>
> Execution Time: 619.410 ms
>
>
>
>
>
>
>
> This is not a bug. You can see very bad estimation on join. Postgres
> expects one value has the same probability in both tables. In your case, it
> is not true. Unfortunately, Postgres has not multi table statistics, so
> there is no easy solution. Usually you need to divide your query to two.
> And maybe you can check your data, why the predicate
> gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id is almost ivalid.
>
>
>
> Regards
>
>
>
> Pavel Stehule
>
>
>
>
>
> So 6520 times slower after vacuum analyze then before.
>
>
>
> Doing VACUUM ANALYZE again doesn’t make it better, only pg_dump +
> pg_restore will go back to the original plan, but then it will break again
> on the first VACUUM ANALYZE.
>
>
>
> I attached a 800K test DB with all sensitive data removed.
>
>
>
> I tried both the default config without changes, and the default config
> with our settings appended:
>
>
>
> max_connections = 1000
>
> shared_buffers = 512MB
>
> effective_cache_size = 6GB
>
> work_mem = 10485kB
>
> maintenance_work_mem = 512MB
>
> min_wal_size = 1GB
>
> max_wal_size = 2GB
>
> checkpoint_completion_target = 0.7
>
> wal_buffers = 16MB
>
> default_statistics_target = 100
>
> random_page_cost = 1
>
> wal_sync_method = open_datasync
>
> fsync = on
>
> synchronous_commit = off
>
>
>
> Doing a VACUUM ANALYZE shouldn’t change a straightforward plan.
>
>
>
> Regards,
>
>
>
> Jan Kort
>
>
>
>
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-05-18 19:45:04 Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries
Previous Message Jan Kort 2021-05-18 16:22:24 RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE