Re: New function normal_rand_array function to contrib/tablefunc.

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New function normal_rand_array function to contrib/tablefunc.
Date: 2024-10-16 07:43:27
Message-ID: 87ldyojxwg.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Dean,

Thanks for the detailed feedback! Here is the rebased version.

> 1). In the tests:
>
> +select setseed(0.8);
> +select rand_array(10, 0, 3, 50::int, 80::int);
..
> this should really have a comment block to distinguish these new tests
> from the preceeding normal_rand() tests.

> 3). It's only necessary to call setseed() once to get a reproducible
> set of results from then on.

> 4). I'd use setseed(0) or setseed(0.5), since those values are exactly
> representable as double precision values, unlike 0.8, ensuring that it
> works the same on all platforms. It might not matter, but why take the
> risk?

All Done, very interesting about the reason why the 0.8 is bad.

> 2). It's worth also having tests for the error cases.

After the code refactor, looks the ERROR is not reachable, so I added
both Assert(false) and elog(ERROR) with no test case for that.

>
> 5). The float8 case still has minimum and maximum value arguments that
> it just ignores. It should either not take those arguments, or it
> should respect them, and try to return float8 values in the requested
> range. I suspect that trying to return float8 values in the requested
> range would be hard, if not impossible, due to the inexact nature of
> double precision arithmetic. That's why I suggested earlier having the
> float8 function not take minval/maxval arguments, and just return
> values in the range 0 to 1.
>
> 11). If the float8 case is made to not have minval/maxval arguments, this code:
>
> + Datum minval = PG_GETARG_DATUM(3),
> + maxval = PG_GETARG_DATUM(4);
>
> and the FunctionCallInfo setup code needs to be different for the
> float8 case, in order to avoid reading and setting undefined
> arguments. Perhaps introduce a "need_val_bounds" boolean variable,
> based on the datatype.

I should have noticed the float8 issue after reading your first
reply.. Actually I don't have speical reason for which I have to support
float8. At least when I working on the patch, I want some toast and
non-toast data type, so supporting both int/bigint and numeric should be
good, at least for my purpose. So in the attached version, I removed the
support for float8 for simplification.

>
> 6). This new function:
>
> +static Datum
> +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
> +{
>
> should have a comment block. In particular, it should document what
> its inputs and outputs are.

Done.
>
> 7). This code:
>
> + int num_tuples = PG_GETARG_INT32(0);
> + int minlen = PG_GETARG_INT32(1);
> + int maxlen = PG_GETARG_INT32(2);
> + Datum minval = PG_GETARG_DATUM(3),
> + maxval = PG_GETARG_DATUM(4);
> + rand_array_fctx *fctx;
> +
> + if (datatype == INT4OID)
> + random_fn_oid = F_RANDOM_INT4_INT4;
> + else if (datatype == INT8OID)
...
> should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
> repeating all those checks for every call.

Done.
>
> 8). I think it would be neater to code the "if (datatype == INT4OID)
> ... else if ..." as a switch statement.
Done.
>
>
> 9). I would swap the last 2 bound checks round, and simplify the error
> messages as follows:

Done.
>
> Also note that primary error messages should not end with a period.
> See https://www.postgresql.org/docs/current/error-style-guide.html

Thanks for sharing this!

> 10). In this error:
>
> + elog(ERROR, "unsupported type %d for rand_array function.",
> + datatype);
>
> "datatype" is of type Oid, which is unsigned, and so it should use
> "%u" not "%d". Also, as above, it should not end with a period, so it
> should be:
>
> + elog(ERROR, "unsupported type %u for rand_array function",
> + datatype);

Done.

>
> 12). This code:
>
> + random_len_fcinfo->args[0].value = minlen;
> + random_len_fcinfo->args[1].value = maxlen;
>
> should really be:
>
> + random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
> + random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
>
> It amounts to the same thing, but documents the fact that it's
> converting an integer to a Datum.

Done.
>
>
> 13). These new functions are significantly under-documented,
> especially when compared to all the other functions on
> https://www.postgresql.org/docs/current/tablefunc.html
>
> They really should have their own subsection, along the same lines as
> "F.41.1.1. Normal_rand", explaining what the functions do, what their
> arguments mean, and giving a couple of usage examples.

Done, very impresive feedback and I know why PostgreSQL can always have
user friendly documentation now.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v20241016-0001-Add-functions-rand_array-function-to-contr.patch text/x-diff 13.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-10-16 07:45:33 Re: replace strtok()
Previous Message Peter Eisentraut 2024-10-16 07:42:22 Re: replace strtok()