Re: general purpose array_sort

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "andreas(at)proxel(dot)se" <andreas(at)proxel(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: general purpose array_sort
Date: 2025-03-31 11:02:19
Message-ID: CAEG8a3J2dZemiK3k6wFjmxAzCH-urx9=5vGB-s8Y5pSSw6gwxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

On Mon, Mar 31, 2025 at 5:58 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I spent some time looking at the v17 patchset. There were some pretty
> strange things in it --- why were some of the variants of array_sort()
> marked as volatile, for example?

I think this was due to some copy-paste of the code nearby.

> But the two things I'd like to
> suggest functionality-wise are:
>
> * The second argument of the variants with booleans should be defined
> as true=descending, not true=ascending. It seems a little odd to me
> for the default of a boolean option not to be "false". Also, then
> you don't need an inversion between the second and third arguments.
> I'm not dead set on this but it just seems a little cleaner.
>
Agreed.

> * I see that the code is set up to detect an unsortable input type
> before it takes the fast exit for "no sort required". I think this
> is poor engineering: we ought to make the fast path as fast as
> possible. The can't-sort case is so rare in real-world usage that
> I do not think it matters if the error isn't thrown by every possible
> call. Besides which, it is inconsistent anyway: consider
> SELECT array_sort(NULL::xid[]);
> which will not error because it will never reach the C code. Why's
> that okay but delivering an answer for "array_sort('{1}'::xid[])"
> is not? I think "throw error only if we must sort and cannot" is
> a perfectly fine definition.
Agreed.

>
> At the code level, I didn't like the way that the multiple entry
> points were set up. I think it's generally cleaner code to have
> a worker function with plain C call and return coding and make
> all the SQL-visible functions be wrappers around that. Also the
> caching mechanism was overcomplicated, in particular because we
> do not need a cache lookup to know which sort operators apply to
> arrays.

Agreed, your refactor made the code cleaner.

>
> So all that leads me to v18 attached. (I merged the two patches
> into one, didn't see much value in splitting them.)
>
> In v18, it's somewhat annoying that the typcache doesn't cache
> the typarray field; we would not need a separate get_array_type()
> lookup if it did. I doubt there is any real reason for that except
> that pg_type.typarray didn't exist when the typcache was invented.
> So I'm tempted to add it. But I looked at existing callers of
> get_array_type() and none of them are adjacent to typcache lookups,
> so only array_sort would be helped immediately. I left it alone
> for the moment; wonder if anyone else has an opinion?

The need for `elmtyp` and `array_type` here because a column can
have arrays with varying dimensions. Maybe other callers don't share
this behavior?

>
> regards, tom lane
>

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-03-31 11:22:35 Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION
Previous Message Yugo Nagata 2025-03-31 11:00:57 Prevent internal error at concurrent CREATE OR REPLACE FUNCTION