Re: BUG #18766: not exists sometimes gives too few records

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: jan(dot)kort(at)genetics(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #18766: not exists sometimes gives too few records
Date: 2025-01-04 11:43:28
Message-ID: 90f8e2c1-7a06-44b9-aaf5-db9bc914bbd1@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Jan,

Please always respond to all, so that it goes to the mailing list too,
not just to me directly. I'll respond to the list this time, but it
makes the thread harder to follow.

On 1/4/25 12:11, Jan Kort wrote:
> Hi Tomas,
>
> You are right that without the index it does produce the correct result
> (0 records), but how do I uncorrupt the index? I tried the following:
>
> * Dropping and recreating the 2 indexes in the explain
> * Dropping and recreating all indexes on the the 2 tables
> * Vacuum analyze advies_type
> * Vacuum analyze advies_hoofd
>
> But it still produces the wrong result (1 record)
>

To "uncorrupt" the index is to rebuild/reindex it. But you already did
that and it didn't help, so it seems more like a bug in Postgres than a
data corruption.

> The explain of the query on the malfunctioning database is:
>
> HashSetOp Except (cost=0.14..29.91 rows=9 width=364)
> -> Append (cost=0.14..29.13 rows=24 width=364)
> -> Subquery Scan on "*SELECT* 1" (cost=0.14..26.62 rows=9
> width=149)
> -> Index Scan using advies_typeo12 on advies_type
> (cost=0.14..26.53 rows=9 width=145)
> Index Cond: ((conditional)::text = 'N'::text)
> Filter: ((owners_id = ANY ('{1,32,25}'::integer[]))
> AND (NOT (SubPlan 1)))
> SubPlan 1
> -> Index Scan using advies_hoofdo16 on
> advies_hoofd advies_hoofd_1 (cost=0.15..2.37 rows=1 width=4)
> Index Cond: (advies_type = advies_type.id
> <http://advies_type.id>)
> Filter: (owners_id = ANY
> ('{1,32,25}'::integer[]))
> -> Subquery Scan on "*SELECT* 2" (cost=1.86..2.39 rows=15
> width=149)
> -> Merge Right Anti Join (cost=1.86..2.24 rows=15
width=145)
> Merge Cond: (advies_hoofd.advies_type =
> advies_type_1.id <http://advies_type_1.id>)
> -> Index Scan using advies_hoofdo16 on advies_hoofd
> (cost=0.15..18.01 rows=293 width=4)
> Filter: (owners_id = ANY
('{1,32,25}'::integer[]))
> -> Sort (cost=1.72..1.76 rows=18 width=145)
> Sort Key: advies_type_1.id <http://
> advies_type_1.id>
> -> Seq Scan on advies_type advies_type_1
> (cost=0.00..1.34 rows=18 width=145)
> Filter: (((conditional)::text =
> 'N'::text) AND (owners_id = ANY ('{1,32,25}'::integer[])))
>
> On the working database it says:
>
> HashSetOp Except (cost=0.00..17.03 rows=9 width=364)
> -> Append (cost=0.00..16.19 rows=26 width=364)
> -> Subquery Scan on "*SELECT* 1" (cost=0.00..13.20 rows=9
> width=149)
> -> Seq Scan on advies_type (cost=0.00..13.11 rows=9
> width=145)
> Filter: (((conditional)::text = 'N'::text) AND
> (owners_id = ANY ('{1,32,25}'::integer[])) AND (NOT (SubPlan 1)))
> SubPlan 1
> -> Seq Scan on advies_hoofd advies_hoofd_1
> (cost=0.00..1.11 rows=1 width=4)
> Filter: ((advies_type = advies_type.id
> <http://advies_type.id>) AND (owners_id = ANY ('{1,32,25}'::integer[])))
> -> Subquery Scan on "*SELECT* 2" (cost=1.57..2.85 rows=17
> width=149)
> -> Hash Right Anti Join (cost=1.57..2.68 rows=17
width=145)
> Hash Cond: (advies_hoofd.advies_type =
> advies_type_1.id <http://advies_type_1.id>)
> -> Seq Scan on advies_hoofd (cost=0.00..1.10
> rows=7 width=4)
> Filter: (owners_id = ANY
('{1,32,25}'::integer[]))
> -> Hash (cost=1.34..1.34 rows=18 width=145)
> -> Seq Scan on advies_type advies_type_1
> (cost=0.00..1.34 rows=18 width=145)
> Filter: (((conditional)::text =
> 'N'::text) AND (owners_id = ANY ('{1,32,25}'::integer[])))
>
> So they are quite different, that gave me an idea why my local database
> wasn't malfunctioning. I inserted 500 dummy records into advies_hoofd
> (without a reference to advies_type).
>
Right. And the "broken" query used an index scan, while the working one
does not. So that's another hint it's some thinko in using the index.

Another thing you might try is disabling the merge join by

SET enable_mergejoin = off;

or even the index scans (enable_indexscan=off), in which case the query
should probably flip to the hashjoin plan. Then you can compare the
EXPLAIN ANALYZE for the two plans (mergejoin returning the wrong number
of results, and hashjoin returning the correct number). The plans do
align pretty well, except for the join type, so comparing the "rows"
should tell you which of the branches is wrong ...

From the two plans it seems the first Append subplan keeps returning 9
rows, so it's probably the hashjoin - I wonder if it might be about the
advies_hoofdo16 index used with the ANY() clause. Hard to say, I don't
know if there's some difference in the data. But you might try dropping
that index, and see if that fixes it.

If yes, I wonder if the result of

SELECT * FROM ADVIES_HOOFD WHERE OWNERS_ID IN (1,32,25)

changes depending on the plan type (if you force it to use / not use the
index). That'd be a much simpler query to investigate.

> After that it was still giving 0 records, but then I ran:
>
> vacuum analyze advies_hoofd
> vacuum analyze advies_type
>
> When I run the query now I get 1 record and it's the 1000001002 record,
> so now my local database is malfunctioning too and apparently in the
> same way as the other database. I don't see anything special about this
> record, it's somewhere in the middle.
>
> At least I know how to break it now, I will work this into a standalone
> testcase monday.
>
> Regards,
>
> Jan

Thanks!

Tomas

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2025-01-04 12:15:11 Re: BUG #18766: not exists sometimes gives too few records
Previous Message Kirill Reshke 2025-01-04 07:14:42 Re: BUG #18763: pg_get_viewdef returns an old def with new relation name