Invalid optimization of VOLATILE function in WHERE clause?

From: Florian(dot)Schoppmann(at)emc(dot)com (Florian Schoppmann)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Invalid optimization of VOLATILE function in WHERE clause?
Date: 2012-09-18 07:13:50
Message-ID: 1kql9l0.13pb0wa1ate2yxN%Florian.Schoppmann@emc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

--8<--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random() < 5::DOUBLE PRECISION/_n;
-->8--

translates into the following query plan:

--8<--
Nested Loop (cost=35.00..65.03 rows=1000 width=4)
CTE source
-> Function Scan on generate_series i (cost=0.00..10.00 rows=1000
width=4)
-> Aggregate (cost=25.00..25.02 rows=1 width=0)
Filter: (random() < (5::double precision / (count(*))::double
precision))
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=0)
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=4)
-->8--

In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.

Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?

Florian

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2012-09-18 08:52:09 Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Previous Message Craig Ringer 2012-09-18 07:07:05 Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY