Re: New function normal_rand_array function to contrib/tablefunc.

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New function normal_rand_array function to contrib/tablefunc.
Date: 2024-07-17 06:29:07
Message-ID: 87v814wmz0.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:

> My suggestion would be to mirror the signatures of the core random()
> functions more closely, and have this:
>
> 1). rand_array(numvals int, minlen int, maxlen int)
> returns setof float8[]
>
> 2). rand_array(numvals int, minlen int, maxlen int,
> minval int, maxval int)
> returns setof int[]
>
> 3). rand_array(numvals int, minlen int, maxlen int,
> minval bigint, maxval bigint)
> returns setof bigint[]
>
> 4). rand_array(numvals int, minlen int, maxlen int,
> minval numeric, maxval numeric)
> returns setof numeric[]

this is indeed a more clean and correct APIs, I will use the above ones
in the next version. Thanks for the suggestion.

It is just not clear to me how verbose the document should to be, and
where the document should be, tablefunc.sgml, the comment above the
function or the places just the code? In many cases you provides above
or below are just implementation details, not the API declaimed purpose.

> Something else that's not obvious is that this patch is relying on the
> core random functions, which means that it's using the same PRNG state
> as those functions. That's probably OK, but it should be documented,
> because it's different from tablefunc's normal_rand() function, which
> uses pg_global_prng_state.

My above question applies to this comment.

> In particular, what this means is that
> calling setseed() will affect the output of these new functions, but
> not normal_rand(). Incidentally, that makes writing tests much simpler
> -- just call setseed() first and the output will be repeatable.

Good to know this user case. for example, should this be documented?

>> In some cases the function returns an empty array. Is it also expected?
>>
>
> Perhaps it would be useful to have separate minimum and maximum array
> length arguments, rather than a mean array length argument.

I'm not sure which one is better, but main user case of this function
for testing pupose, so it I think minimum and maximum array length is
good for me too.

>
> Actually, I find the current behaviour somewhat counterintuitive. Only
> after reading the source code did I realise what it's actually doing,
> which is this:
>
> Row 1: array of random length in range [0, meanarraylen]
> Row 2: array of length 2*meanarraylen - length of previous array
> Row 3: array of random length in range [0, meanarraylen]
> Row 4: array of length 2*meanarraylen - length of previous array
> ...
>
> That's far from obvious (it's certainly not documented) and I don't
> think it's a particularly good way of achieving a specified mean array
> length, because only half the lengths are random.

I'm not sure how does this matter in real user case.

> One thing that's definitely needed is more documentation. It should
> have its own new subsection, like the other tablefunc functions.

is the documentaion for the '2*meanarraylen - lastarraylen'?

and What is new subsection, do you mean anything wrong in
'tablefunc.sgml', I did have some issue to run 'make html', but the
error exists before my patch, so I change the document carefully without
testing it. do you know how to fix the below error in 'make html'?

$/usr/bin/xsltproc --nonet --path . --path . --stringparam pg.version '18devel' stylesheet.xsl postgres-full.xml

I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"
compilation error: file stylesheet.xsl line 6 element import
xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent
stylesheet-html-common.xsl:4: warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/common/entities.ent"
%common.entities;
^
stylesheet-html-common.xsl:124: parser error : Entity 'primary' not defined
translate(substring(&primary;, 1, 1),

> Something else that confused me is why this function is called
> normal_rand_array(). The underlying random functions that it's calling
> return random values from a uniform distribution, not a normal
> distribution. Arrays of normally distributed random values might also
> be useful, but that's not what this patch is doing.

OK, you are right, your new names should be better.

> Also, the function accepts float8 minval and maxval arguments, and
> then simply ignores them and returns random float8 values in the range
> [0,1), which is highly counterintuitive.

This is a obvious bug and it only exists in float8 case IIUC, will fix
it in the next version.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-07-17 06:31:22 Re: New function normal_rand_array function to contrib/tablefunc.
Previous Message Amit Kapila 2024-07-17 06:24:45 Re: long-standing data loss bug in initial sync of logical replication