Re: New function normal_rand_array function to contrib/tablefunc.

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New function normal_rand_array function to contrib/tablefunc.
Date: 2024-10-15 10:45:03
Message-ID: CAEZATCXZt1sUuT4SdZo5vn1rJtm2Rp3UCUsH2YzyBt-vHRXrrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 29 Aug 2024 at 05:39, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>
> Yes, that's a valid usage. the new vesion is attached. I have changed
> the the commit entry [1] from "Waiting on Author" to "Needs review".
>

Note that this needs a rebase, following commit 4681ad4b2f.

Here are a few more review comments:

1). In the tests:

+select setseed(0.8);
+select rand_array(10, 0, 3, 50::int, 80::int);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::float8, 80::float8);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::float4, 80::float4);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);

this should really have a comment block to distinguish these new tests
from the preceeding normal_rand() tests.

2). It's worth also having tests for the error cases.

3). It's only necessary to call setseed() once to get a reproducible
set of results from then on.

4). I'd use setseed(0) or setseed(0.5), since those values are exactly
representable as double precision values, unlike 0.8, ensuring that it
works the same on all platforms. It might not matter, but why take the
risk?

5). The float8 case still has minimum and maximum value arguments that
it just ignores. It should either not take those arguments, or it
should respect them, and try to return float8 values in the requested
range. I suspect that trying to return float8 values in the requested
range would be hard, if not impossible, due to the inexact nature of
double precision arithmetic. That's why I suggested earlier having the
float8 function not take minval/maxval arguments, and just return
values in the range 0 to 1.

6). This new function:

+static Datum
+rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
+{

should have a comment block. In particular, it should document what
its inputs and outputs are.

7). This code:

+ int num_tuples = PG_GETARG_INT32(0);
+ int minlen = PG_GETARG_INT32(1);
+ int maxlen = PG_GETARG_INT32(2);
+ Datum minval = PG_GETARG_DATUM(3),
+ maxval = PG_GETARG_DATUM(4);
+ rand_array_fctx *fctx;
+
+ if (datatype == INT4OID)
+ random_fn_oid = F_RANDOM_INT4_INT4;
+ else if (datatype == INT8OID)
+ random_fn_oid = F_RANDOM_INT8_INT8;
+ else if (datatype == FLOAT8OID)
+ random_fn_oid = F_RANDOM_;
+ else if (datatype == NUMERICOID)
+ random_fn_oid = F_RANDOM_NUMERIC_NUMERIC;
+ else
+ elog(ERROR, "unsupported type %d for rand_array function.",
+ datatype);
+
+ if (num_tuples < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number of rows cannot be negative")));
+
+ if (minlen > maxlen)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("minlen must not be greater than maxlen.")));
+
+ if (minlen < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("minlen and maxlen must be greater than or
equal to zero.")));

should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
repeating all those checks for every call.

8). I think it would be neater to code the "if (datatype == INT4OID)
... else if ..." as a switch statement.

9). I would swap the last 2 bound checks round, and simplify the error
messages as follows:

if (minlen < 0)
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("minlen must be greater than or equal to zero"));

if (maxlen < minlen)
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("maxlen must be greater than or equal to minlen"));

Also note that primary error messages should not end with a period.
See https://www.postgresql.org/docs/current/error-style-guide.html

10). In this error:

+ elog(ERROR, "unsupported type %d for rand_array function.",
+ datatype);

"datatype" is of type Oid, which is unsigned, and so it should use
"%u" not "%d". Also, as above, it should not end with a period, so it
should be:

+ elog(ERROR, "unsupported type %u for rand_array function",
+ datatype);

11). If the float8 case is made to not have minval/maxval arguments, this code:

+ Datum minval = PG_GETARG_DATUM(3),
+ maxval = PG_GETARG_DATUM(4);

and the FunctionCallInfo setup code needs to be different for the
float8 case, in order to avoid reading and setting undefined
arguments. Perhaps introduce a "need_val_bounds" boolean variable,
based on the datatype.

12). This code:

+ random_len_fcinfo->args[0].value = minlen;
+ random_len_fcinfo->args[1].value = maxlen;

should really be:

+ random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
+ random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);

It amounts to the same thing, but documents the fact that it's
converting an integer to a Datum.

13). These new functions are significantly under-documented,
especially when compared to all the other functions on
https://www.postgresql.org/docs/current/tablefunc.html

They really should have their own subsection, along the same lines as
"F.41.1.1. Normal_rand", explaining what the functions do, what their
arguments mean, and giving a couple of usage examples.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-10-15 10:55:50 Re: Use function smgrclose() to replace the loop
Previous Message Daniel Gustafsson 2024-10-15 10:42:39 Re: Add support to TLS 1.3 cipher suites and curves lists