From: | Tom van Tilburg <tom(dot)van(dot)tilburg(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: inconsistent behaviour of set-returning functions in sub-query with random() |
Date: | 2016-09-27 09:19:17 |
Message-ID: | CAP3PPDj3_a1fOM8EPteDj++BYivB9GS_NB8UY6Y_XBEhbP5odQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Good to know and I agree that it is not an urgent case.
I think this practice might be more common in the POSTGIS community where
there are plenty of set-returning-functions used in this way. My use was
taking a random sample of a pointcloud distrubution.
I took the liberty to post your answer at stackexchange.
thanks,
Tom
On Mon, 26 Sep 2016 at 21:38 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Tom van Tilburg <tom(dot)van(dot)tilburg(at)gmail(dot)com> writes:
> > I'm often using the WHERE clause random() > 0.5 to pick a random subset
> of
> > my data. Now I noticed that when using a set-returning function in a
> > sub-query, I either get the whole set or none (meaning that the WHERE
> > random() > 0.5 clause is interpreted *before* the set is being
> generated).
> > e.g.:
> >
> > SELECT num FROM (
> > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num) AS foo WHERE
> random() > 0.5;
>
> Hmm, I think this is an optimizer bug. There are two legitimate behaviors
> here:
>
> SELECT * FROM unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should (and does) re-evaluate the WHERE for every row output by unnest().
>
> SELECT unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should evaluate WHERE only once, since that happens before expansion of the
> set-returning function in the targetlist. (If you're an Oracle user and
> you imagine this query as having an implicit "FROM dual", the WHERE should
> be evaluated for the single row coming out of the FROM clause.)
>
> In the case you've got here, given the placement of the WHERE in the outer
> query, you'd certainly expect it to be evaluated for each row coming out
> of the inner query. But the optimizer is deciding it can push the WHERE
> clause down to become a WHERE of the sub-select. That is legitimate in a
> lot of cases, but not when there are SRF(s) in the sub-select's
> targetlist, because that pushes the WHERE to occur before the SRF(s),
> analogously to the change between the two queries I wrote.
>
> I'm a bit hesitant to change this in existing releases. Given the lack
> of previous complaints, it seems more likely to break queries that were
> behaving as-expected than to make people happy. But we could change it
> in v10 and up, especially since some other corner-case changes in
> SRF-in-tlist behavior are afoot.
>
> In the meantime, you could force it to work as you wish by inserting the
> all-purpose optimization fence "OFFSET 0" in the sub-select:
>
> =# SELECT num FROM (
> SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num OFFSET 0) AS foo WHERE
> random() > 0.5;
> num
> -----
> 1
> 4
> 7
> 9
> (4 rows)
>
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | PHANIKUMAR G | 2016-09-27 11:12:18 | Re: need approval to join forums/community |
Previous Message | Michael Paquier | 2016-09-27 05:44:40 | Re: Frequent "pg_ctl status" removing(?) semaphores (unlikely) |
From | Date | Subject | |
---|---|---|---|
Next Message | Victor Wagner | 2016-09-27 09:19:56 | Re: Patch: Implement failover on libpq connect level. |
Previous Message | Ashutosh Bapat | 2016-09-27 09:18:21 | Re: Declarative partitioning - another take |