| From: | PG Bug reporting form <noreply(at)postgresql(dot)org> | 
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Cc: | joan(at)sanchezsabe(dot)com | 
| Subject: | BUG #18851: Queries with xxx NOT IN (SELECT xxx FROM table) fail to run (or run very slowly) on v17 (v14 ok) | 
| Date: | 2025-03-14 16:16:06 | 
| Message-ID: | 18851-5a0a9801290135d7@postgresql.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
The following bug has been logged on the website:
Bug reference:      18851
Logged by:          Joan Sánchez Sabé
Email address:      joan(at)sanchezsabe(dot)com
PostgreSQL version: 17.4
Operating system:   Windows 11 24H2 (26100.3194)
Description:        
I have experienced a misbehaviour (IMHO) in PostgreSQL 17 that didn't occur
in PostgreSQL 14.
Minimum reproducible scenario to show the error:
1) Table with a primary key (integer) and 500_000 records on it.
CREATE TABLE t1 AS
(
    SELECT
        id
    FROM
        generate_series(1, 1000000) AS s(id)
    ORDER BY
        random()
    LIMIT
        500000
) ;
ALTER TABLE t1
    ADD PRIMARY KEY (id);
ANALYZE t1 ;
2) Second table, with a smaller (200_000) number of values of the same type,
some of them overlapping the first table, without a primary key (nor any
index).
CREATE TABLE t2 AS
(
    SELECT
        id
    FROM
        generate_series(1, 1000000) AS s(id)
    ORDER BY
        random()
    LIMIT
        200000
) ;
3) The following query, which just looks for values in t2 not in t1, takes
about 200 ms on my computer using "PostgreSQL 14.17, compiled by Visual C++
build 1942, 64-bit"; running on Windows 11 Pro 24H2 (26100.3194).
SELECT
    t2.id
FROM
    t2
WHERE
    t2.id NOT IN (SELECT id FROM t1) ;
Output: a random list of around 100_000 numbers.
   
The query plan is as follows:
    Seq Scan on public.t2  (cost=8463.00..11848.00 rows=100000 width=4)
(actual time=122.093..186.668 rows=99566 loops=1)
      Output: t2.id
      Filter: (NOT (hashed SubPlan 1))
      Rows Removed by Filter: 100434
      SubPlan 1
        ->  Seq Scan on public.t1  (cost=0.00..7213.00 rows=500000 width=4)
(actual time=0.017..20.573 rows=500000 loops=1)
              Output: t1.id
    Planning Time: 0.173 ms
    Execution Time: 192.002 ms
You can check a (smallish) version on https://dbfiddle.uk/aHTC8YQd
--------
The same query, executed on "PostgreSQL 17.4 on x86_64-windows, compiled by
msvc-19.42.34436, 64-bit)" takes "forever". I actually gave up after 10
minutes.
The query plan was:
    Seq Scan on public.t2  (cost=0.00..1294403396.00 rows=100000 width=4)
      Output: t2.id
      Filter: (NOT (ANY (t2.id = (SubPlan 1).col1)))
      SubPlan 1
        ->  Materialize  (cost=0.00..11694.00 rows=500000 width=4)
              Output: t1.id
              ->  Seq Scan on public.t1  (cost=0.00..7240.00 rows=500000
width=4)
                    Output: t1.id
The same query works on smaller tables (sizes 100_000 and 40_000,
respectively; i.e.: 5 x smaller). In that case, the execution plan is the
same as with PostgreSQL 14 (with values divided by aprox. 5, as expected)
    Seq Scan on public.t2  (cost=1698.00..2502.00 rows=24480 width=4)
(actual time=20.813..28.825 rows=20016 loops=1)
      Output: t2.id
      Filter: (NOT (ANY (t2.id = (hashed SubPlan 1).col1)))
      Rows Removed by Filter: 19984
      SubPlan 1
        ->  Seq Scan on public.t1  (cost=0.00..1448.00 rows=100000 width=4)
(actual time=0.013..4.844 rows=100000 loops=1)
              Output: t1.id
    Planning Time: 0.205 ms
    Execution Time: 30.565 ms
So, it seems that the problem appears when the execution planner adds the
"Materialize" node.
If necessary, I can add configuration files for both versions of
PostgreSQL.
Many thanks in advance
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2025-03-14 16:18:42 | Re: ISN extension - wrong volatility level for isn_weak() function | 
| Previous Message | Viktor Holmberg | 2025-03-14 16:08:36 | Re: ISN extension - wrong volatility level for isn_weak() function |