Unworkable plan above certain row count

From: André Hänsel <andre(at)webkr(dot)de>
To: <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Unworkable plan above certain row count
Date: 2022-04-28 00:52:57
Message-ID: 035001d85a9a$4dcfd840$e96f88c0$@webkr.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I noticed an issue in a simple query with WHERE NOT IN (SELECT ...). I am
aware that anti-joins with NOT IN are currently not optimized and should be
rewritten as WHERE NOT EXISTS (SELECT ...), so if this is irrelevant please
just ignore it.

Here is a setup that works:

CREATE TABLE a
(
a_id serial NOT NULL,
PRIMARY KEY (a_id)
);
CREATE TABLE b
(
b_id serial NOT NULL,
a_id int NOT NULL,
PRIMARY KEY (b_id)
);

INSERT INTO a(a_id) SELECT generate_series(1, 20000);
INSERT INTO b(b_id, a_id) SELECT generate_series(1, 500000), floor(random()
* 22000 + 1)::int;

ANALYZE a;
ANALYZE b;

EXPLAIN SELECT count(*) FROM b WHERE a_id NOT IN (SELECT a_id FROM a);

Finalize Aggregate (cost=7596.23..7596.24 rows=1 width=8)
-> Gather (cost=7596.12..7596.23 rows=1 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=6596.12..6596.13 rows=1 width=8)
-> Parallel Seq Scan on b (cost=339.00..6228.47 rows=147059
width=0)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on a (cost=0.00..289.00 rows=20000
width=4)

Fiddle:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=497ab1d5eec6e02d4d1c0f6630b6f1
f1

Now if you change
INSERT INTO a(a_id) SELECT generate_series(1, 20000);
to
INSERT INTO a(a_id) SELECT generate_series(1, 200000);
i.e. add a zero, the plan becomes this:

Finalize Aggregate (cost=759860198.41..759860198.42 rows=1 width=8)
-> Gather (cost=759860198.29..759860198.40 rows=1 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=759859198.29..759859198.30 rows=1
width=8)
-> Parallel Seq Scan on b (cost=0.00..759858830.65
rows=147059 width=0)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..4667.00 rows=200000
width=4)
-> Seq Scan on a (cost=0.00..2885.00
rows=200000 width=4)

Fiddle:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=bec018196195635cb6ec05ccae3213
7c

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2022-04-28 02:08:14 Re: Unworkable plan above certain row count
Previous Message Mickael van der Beek 2022-04-27 14:54:35 Re: Array of integer indexed nested-loop semi join