BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: scknoblich(at)gmail(dot)com
Subject: BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery
Date: 2024-03-18 23:23:44
Message-ID: 18399-17778030a1971177@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18399
Logged by: Sawyer Knoblich
Email address: scknoblich(at)gmail(dot)com
PostgreSQL version: 16.2
Operating system: Docker image on macOS
Description:

Hello, I have the following standalone query that operates on a dataset with
a text column and attempts to find all rows where that column's text is an
integer that is under a certain value:

```
with raw_data(id, value) as (values (1, '1900'), (2, '2100'), (3, 'abc')),
integers as (select value::integer
from raw_data
where id in (select id from raw_data where value ~
'^\d+$'))
select count(*)
from integers
where value < 2000;
```

In this query the "integers" CTE attempts to find all rows where the "value"
column is able to be converted to an integer (using a subquery) and performs
that conversion, and then the main query does a simple where + count to
produce the required dataset. With this query my expectation was that the
main query would only be operating on values that had already been converted
to integers.

Running the query as-is results in `[22P02] ERROR: invalid input syntax for
type integer: "abc"`. From the query plan I can see that the optimizer has
chosen to directly replace `q.value` with `value::integer` in the outermost
where condition to be evaluated independently from the regex match, which
ends up attempting the "abc"::integer cast before its row is able to be
filtered out by the CTE's own condition:

```
Aggregate (cost=0.20..0.21 rows=1 width=8)
CTE raw_data
-> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=36)
-> Nested Loop Semi Join (cost=0.00..0.16 rows=1 width=0)
Join Filter: (raw_data.id = raw_data_1.id)
-> CTE Scan on raw_data (cost=0.00..0.08 rows=1 width=4)
Filter: ((value)::integer < 2000)
-> CTE Scan on raw_data raw_data_1 (cost=0.00..0.07 rows=1
width=4)
Filter: (value ~ '^\d+$'::text)
```

I have also reproduced this behavior when using a different condition on the
main query such as a simple null check, when selecting from a larger
persisted table instead of the "raw_data" CTE used here for simplicity, and
when using a subquery instead of the "integers" CTE. I did find that if I
remove the subquery inside the CTE's where clause and instead apply the
regex directly to the "value" column then I get the following query plan
which succeeds with no issues:

```
with raw_data(id, value) as (values (1, '1900'), (2, '2100'), (3, 'abc')),
integers as (select value::integer
from raw_data
where value ~ '^\d+$')
select count(*)
from integers
where value < 2000;
----------
Aggregate (cost=0.07..0.08 rows=1 width=8)
-> Values Scan on "*VALUES*" (cost=0.00..0.07 rows=1 width=0)
Filter: ((column2 ~ '^\d+$'::text) AND ((column2)::integer <
2000))
```

My expectation was that the query should behave as if the CTE (or subquery
if I had used that instead) has fully executed before moving on to the main
query and this behavior breaks that assumption (which may or may not be
correct). I attempted to search the docs for anything related to this,
either CTE/subquery execution order guarantees or hoisting of casts, but I
couldn't find anything about it. If this turns out not to a bug and is just
expected behavior with existing documentation could you please link me to
the relevant page for future reference?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-03-19 02:16:35 RE: Potential data loss due to race condition during logical replication slot creation
Previous Message Thomas Munro 2024-03-18 22:48:50 Re: Regression tests fail with musl libc because libpq.so can't be loaded