From: | Florian(dot)Schoppmann(at)emc(dot)com (Florian Schoppmann) |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Invalid optimization of VOLATILE function in WHERE clause? |
Date: | 2012-09-18 07:13:50 |
Message-ID: | 1kql9l0.13pb0wa1ate2yxN%Florian.Schoppmann@emc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query
--8<--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random() < 5::DOUBLE PRECISION/_n;
-->8--
translates into the following query plan:
--8<--
Nested Loop (cost=35.00..65.03 rows=1000 width=4)
CTE source
-> Function Scan on generate_series i (cost=0.00..10.00 rows=1000
width=4)
-> Aggregate (cost=25.00..25.02 rows=1 width=0)
Filter: (random() < (5::double precision / (count(*))::double
precision))
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=0)
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=4)
-->8--
In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.
Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?
Florian
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2012-09-18 08:52:09 | Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY |
Previous Message | Craig Ringer | 2012-09-18 07:07:05 | Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY |