From: | Don Seiler <don(at)seiler(dot)us> |
---|---|
To: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Dangerous Naming Confusion |
Date: | 2021-03-29 22:00:34 |
Message-ID: | CAHJZqBBxQhaSFuHKhR-sp95ibxibad+oaBoJv=1wVO-1h366eg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Good evening,
Please see my gist at
https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 for a
complete test case.
I tested this on PG 12.6 and 13.2 and observed the same on both.
We were expecting the queries that use dts_temp to only return 3 rows.
However the subquery starting at line 36 returns ALL 250,000 rows from
dts_orders. Note that the "order_id" field doesn't exist in the dts_temp
table, so I'm assuming PG is using the "order_id" field from the dts_orders
table. If I use explicit table references like in the query at line 48,
then I get the error I would expect that the "order_id" column doesn't
exist in dts_temp.
When I use the actual column name "a" for dts_temp, then I get the 3 rows
back as expected.
I'm wondering if this is expected behavior that PG uses the
dts_orders.order_id value in the subquery "select order_id from dts_temp"
when dts_temp doesn't have its own order_id column. I would have expected
an error that the column doesn't exist. Seems very counter-intuitive to
think PG would use a column from a different table.
This issue was discovered today when this logic was used in an UPDATE and
ended up locking all rows in a 5M row table and brought many apps to a
grinding halt. Thankfully it was caught and killed before it actually
updated anything.
Thanks,
Don.
--
Don Seiler
www.seiler.us
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2021-03-29 22:19:55 | Re: Dangerous Naming Confusion |
Previous Message | Bryn Llewellyn | 2021-03-29 21:48:40 | Re: = t1 - t0 but t0 + i <> t1 when t1 and t2 timestamptz values and i is an interval value |