From: | Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Date: | 2022-07-21 11:15:43 |
Message-ID: | f6267e99-4448-b47c-9fe8-ec23690b931f@aboutsource.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Am 21.07.22 um 10:41 schrieb Dean Rasheed:
>
> A couple of quick comments on the current patch:
Thank you for your feedback!
> It's important to mark these new functions as VOLATILE, not IMMUTABLE,
> otherwise they won't work as expected in queries. See
> https://www.postgresql.org/docs/current/xfunc-volatility.html
CREATE FUNCTION marks functions as VOLATILE by default if not explicitly
marked otherwise. I assumed function definitions in pg_proc.dat have the
same behavior. I will fix that.
https://www.postgresql.org/docs/current/sql-createfunction.html
> It would be better to use pg_prng_uint64_range() rather than rand() to
> pick elements. Partly, that's because it uses a higher quality PRNG,
> with a larger internal state, and it ensures that the results are
> unbiased across the range. But more importantly, it interoperates with
> setseed(), allowing predictable sequences of "random" numbers to be
> generated -- something that's useful in writing repeatable regression
> tests.
I agree that we should use pg_prng_uint64_range(). However, in order to
achieve interoperability with setseed() we would have to use
drandom_seed (rather than pg_global_prng_state) as rng state, which is
declared statically in float.c and exclusively used by random(). Do we
want to expose drandom_seed to other functions?
> Assuming these new functions are made to interoperate with setseed(),
> which I think they should be, then they also need to be marked as
> PARALLEL RESTRICTED, rather than PARALLEL SAFE. See
> https://www.postgresql.org/docs/current/parallel-safety.html, which
> explains why setseed() and random() are parallel restricted.
As mentioned above, i assumed the default here is PARALLEL UNSAFE. I'll
fix that.
> In my experience, the requirement for sampling with replacement is
> about as common as the requirement for sampling without replacement,
> so it seems odd to provide one but not the other. Of course, we can
> always add a with-replacement function later, and give it a different
> name. But maybe array_sample() could be used for both, with a
> "with_replacement" boolean parameter?
We can also add a "with_replacement" boolean parameter which is false by
default to array_sample() later. I do not have a strong opinion about
that and will implement it, if desired. Any opinions about
default/no-default?
Martin
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2022-07-21 12:25:27 | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Previous Message | Daulat | 2022-07-21 10:01:44 | Re: More than one Cluster on single server (single instance) |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2022-07-21 11:17:51 | Re: standby recovery fails (tablespace related) (tentative patch and discussion) |
Previous Message | Thomas Munro | 2022-07-21 11:14:57 | Re: standby recovery fails (tablespace related) (tentative patch and discussion) |