From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | s(dot)p(dot)e(at)gmx-topmail(dot)de |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Jeff Davis <jdavis(at)postgresql(dot)org> |
Subject: | Re: Postgress 13.x: wrong result for delete with subquery |
Date: | 2021-01-29 22:26:17 |
Message-ID: | 2828817.1611959177@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
s(dot)p(dot)e(at)gmx-topmail(dot)de writes:
> I found a dataset (attachment bug13.csv); which produces a wrong result on postgreSQL 13 for a special delete command:
Yeah. "git bisect" pins this on Jeff's commit 230230223, and with a bit
of digging it's not hard to see what the problem is. The plan ends up
as a hash join on the varchar columns:
Delete on pg_temp_3.t1 (cost=853.22..1730.13 rows=0 width=0)
-> Hash Join (cost=853.22..1730.13 rows=7289 width=12)
Output: t1.ctid, t2.ctid
Inner Unique: true
Hash Cond: (((t1.id1)::text = (t2.id1)::text) AND ((t1.id2)::text = (t2.id2)::text))
-> Seq Scan on pg_temp_3.t1 (cost=0.00..634.55 rows=29155 width=74)
Output: t1.ctid, t1.id1, t1.id2
-> Hash (cost=809.48..809.48 rows=2916 width=74)
Output: t2.ctid, t2.id1, t2.id2
-> HashAggregate (cost=780.32..809.48 rows=2916 width=74)
Output: t2.ctid, t2.id1, t2.id2
Group Key: (t2.id1)::text, (t2.id2)::text
-> Seq Scan on pg_temp_3.t2 (cost=0.00..634.55 rows=29155 width=74)
Output: t2.ctid, t2.id1, t2.id2, t2.id1, t2.id2
Notice that the hashagg's input relation produces five columns,
ctid, id1, id2, id1::text, id2::text (EXPLAIN doesn't show the implicit
casts on the last two, which are the hashing columns). find_hash_columns
decides that only the first three of these need be spilled, which ends up
making the hash keys NULL in reloaded tuples. The only astonishing thing
about this test case is that just one tuple fails to be joined.
So fundamentally, this patch confused "Vars in the qual trees" with
"input columns that we might access", which is just wrong. In some
cases, the input columns represent expressions not plain Vars.
We might be able to salvage this by having find_hash_columns examine
the Agg node's grpColIdx list and assume that columns listed there need
to be preserved. But frankly, now that I've seen this case, I'm not
sure that there's anything correct about the approach being used.
We might be well advised to just revert 230230223 and think harder.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Cadstructure Technology | 2021-01-30 09:37:57 | Unable to installed postgis extension using stack builder |
Previous Message | Tom Lane | 2021-01-29 17:55:42 | Re: BUG #16843: pg_upgrade from 12.5 to 13.1 with extension plperlu failed |