Duplicate WHERE condition changes performance and plan

From: "singh400(at)gmail(dot)com" <singh400(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Duplicate WHERE condition changes performance and plan
Date: 2020-04-15 19:55:53
Message-ID: CAOtbvRKSSg5VOY1tE7h4tM09J=etjNWyM8cMm=G-cRk7=KyeEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

We have an odd issue where specifying the same where clause twice causes PG
to pick a much more efficent plan. We would like to know why.

Query A (this is the 'slow' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND processed = true;

Query B (this is the 'fast' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND problem.status != 2
AND processed = true;

The EXPLAIN ANALYZE for both queries can be found here:-
Query A: https://explain.depesz.com/s/lFuy
Query B: https://explain.depesz.com/s/Jqmv

The table definitions (including the indexes) can be found here:-
public.problem:
https://gist.github.com/indy-singh/e90ee6d23d053d32c2564501720353df
public.problem_instance:
https://gist.github.com/indy-singh/3c77096b91c89428752cf314d8e20286

Data stats:-
public.problem has around 10,000 rows and once the condition status != 2 is
applied there are around 800 rows left.
public.problem_instance has around 592,000 rows and once the condition
processed = true is applied there are around 370,000 rows left.

PG version:
PostgreSQL 9.5.19 on x86_64-pc-linux-gnu (Debian 9.5.19-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

-- SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
relname='TABLE_NAME'
Table metadata:-
public.problem:
https://gist.github.com/indy-singh/ff34a3b6e45432ea4be2bf0b5038e0be
public.problem_instance:
https://gist.github.com/indy-singh/a09fe66c8a8840b7661ce9726ebcab71

Last Vacuum:-
public.problem: 2020-04-14 23:11:47.51056+01
public.problem_instance: 2020-04-14 20:11:04.187138+01

Last Analyze:
public.problem: 2020-04-14 23:11:47.592878+01
public.problem_instance: 2020-04-14 20:11:04.508432+01

Server Configuration:
https://gist.github.com/indy-singh/8386d59206af042d365e5cd49fbae68f

I tried my best getting all the information up front, please let me know if
I missed anything.

Thanks,
Indy

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Carboni 2020-04-15 22:53:58 Using unlogged tables for web sessions
Previous Message Don Seiler 2020-04-13 15:08:47 Re: High kswapd