Re: [PATCH] Introduce array_shuffle() and array_sample()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Introduce array_shuffle() and array_sample()
Date: 2022-07-18 20:27:30
Message-ID: CA+TgmoaPKwPqhmrk7i1jRjqqt5=JDLeGze5wYPav5ietV4OFbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher
<martin(dot)kalcher(at)aboutsource(dot)net> wrote:
> Thanks for all your feedback and help. I got a patch that i consider
> ready for review. It introduces two new functions:
>
> array_shuffle(anyarray) -> anyarray
> array_sample(anyarray, integer) -> anyarray
>
> array_shuffle() shuffles an array (obviously). array_sample() picks n
> random elements from an array.

I like this idea.

I think it's questionable whether the behavior of array_shuffle() is
correct for a multi-dimensional array. The implemented behavior is to
keep the dimensions as they were, but permute the elements across all
levels at random. But there are at least two other behaviors that seem
potentially defensible: (1) always return a 1-dimensional array, (2)
shuffle the sub-arrays at the top-level without the possibility of
moving elements within or between sub-arrays. What behavior we decide
is best here should be documented.

array_sample() will return elements in random order when sample_size <
array_size, but in the original order when sample_size >= array_size.
Similarly, it will always return a 1-dimensional array in the former
case, but will keep the original dimensions in the latter case. That
seems pretty hard to defend. I think it should always return a
1-dimensional array with elements in random order, and I think this
should be documented.

I also think you should add test cases involving multi-dimensional
arrays, as well as arrays with non-default bounds. e.g. trying
shuffling or sampling some values like
'[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[]

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-07-18 20:34:50 Re: [PATCH] Introduce array_shuffle() and array_sample()
Previous Message Martin Kalcher 2022-07-18 20:15:43 Re: [PATCH] Introduce array_shuffle() and array_sample()

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-07-18 20:28:35 Re: pg15b2: large objects lost on upgrade
Previous Message Andres Freund 2022-07-18 20:23:27 Re: [RFC] building postgres with meson - v10