Quick Links

Invalid optimization of VOLATILE function in WHERE clause?

From:	Florian(dot)Schoppmann(at)emc(dot)com (Florian Schoppmann)
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Invalid optimization of VOLATILE function in WHERE clause?
Date:	2012-09-18 07:13:50
Message-ID:	1kql9l0.13pb0wa1ate2yxN%Florian.Schoppmann@emc.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi all,

In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

--8<--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random() < 5::DOUBLE PRECISION/_n;
-->8--

translates into the following query plan:

--8<--
Nested Loop (cost=35.00..65.03 rows=1000 width=4)
CTE source
-> Function Scan on generate_series i (cost=0.00..10.00 rows=1000
width=4)
-> Aggregate (cost=25.00..25.02 rows=1 width=0)
Filter: (random() < (5::double precision / (count(*))::double
precision))
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=0)
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=4)
-->8--

In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.

Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?

Florian

Responses

Re: Invalid optimization of VOLATILE function in WHERE clause? at 2012-09-19 14:30:36 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Etsuro Fujita	2012-09-18 08:52:09	Re: WIP patch: add (PRE\|POST)PROCESSOR options to COPY
Previous Message	Craig Ringer	2012-09-18 07:07:05	Re: WIP patch: add (PRE\|POST)PROCESSOR options to COPY