Re: general purpose array_sort

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "andreas(at)proxel(dot)se" <andreas(at)proxel(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: general purpose array_sort
Date: 2024-11-04 05:46:39
Message-ID: ZyhfvysfbzqNEX1M@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 03, 2024 at 11:33:05AM +0800, Junwang Zhao wrote:
> Rebase needed due to array_reverse committed, PFA v11.

There has been another conflict since you have posted this version
(noticed that after my business in 027124a872d7). I have looked at
0001.

+ if (ARR_NDIM(array) < 1)
+ PG_RETURN_ARRAYTYPE_P(array);
There is no point in doing a sort if the array has only one element.
You can add a check based on "ARR_DIMS(array)[0] < 2" to achieve that.

+ typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
+ if (!OidIsValid(typentry->lt_opr))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify ordering operator for type %s",
+ format_type_be(elmtyp))));

The patch introduces two error paths based on the fact that ordering
operators could not be found depending on a data type that lacks the
ordering operator and the array ordering operator part. It is right
to issue an error if these are lacking, like the various stats paths.
Should we have some regression tests with specific data types for
these errors, though? The stats paths don't care much about these
error cases, but it does not mean that we should not care about them.
In short, let's have negative test coverage if we can.

+typedef struct ArraySortCachedInfo
+{
+ TypeCacheEntry *typentry;
+ TypeCacheEntry *array_typentry;
+} ArraySortCachedInfo;

Let's put that at the top of the file, with a comment about how it
links to array_sort() for the caching with fn_extra. Let's also
document the meaning of the fields.

FWIW, I am confused by this implementation, where you have to allocate
the two TypeCacheEntry because of the fact that you have to deal with
the 1-dimension case and the multi-dimension case. In the context of
a single function call, why do you need both typentry and
array_typentry, actually? Wouldn't it be enough to use one typentry
that points to the typcache, meaning that you don't really need to use
the extra business with fn_mcxt, no? If you require both (because I
may be wrong), perhaps you should have a regression test that's able
to break when removing array_typentry, changing the code to only rely
on typentry. Note: I have just removed array_typentry in a quick
test, current coverage was happy about it. Feel free to prove me
wrong.

Agreed that the function should be immutable. The results are fixed
depending on the input even with the COLLATE clauses appended.

Let's add something when there is only one element in the first
dimension of the array, say two cases one with an int and one with an
array of ints like:
SELECT array_sort('{1}'::int[]);
SELECT array_sort('{{1}}'::int[]);
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-04 05:49:39 Re: [PATCH] Rename trim_array() for consistency with the rest of array_* functions
Previous Message Amit Kapila 2024-11-04 05:26:22 Re: Pgoutput not capturing the generated columns