Re: general purpose array_sort

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "andreas(at)proxel(dot)se" <andreas(at)proxel(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: general purpose array_sort
Date: 2025-03-30 21:58:11
Message-ID: 3707751.1743371891@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I spent some time looking at the v17 patchset. There were some pretty
strange things in it --- why were some of the variants of array_sort()
marked as volatile, for example? But the two things I'd like to
suggest functionality-wise are:

* The second argument of the variants with booleans should be defined
as true=descending, not true=ascending. It seems a little odd to me
for the default of a boolean option not to be "false". Also, then
you don't need an inversion between the second and third arguments.
I'm not dead set on this but it just seems a little cleaner.

* I see that the code is set up to detect an unsortable input type
before it takes the fast exit for "no sort required". I think this
is poor engineering: we ought to make the fast path as fast as
possible. The can't-sort case is so rare in real-world usage that
I do not think it matters if the error isn't thrown by every possible
call. Besides which, it is inconsistent anyway: consider
SELECT array_sort(NULL::xid[]);
which will not error because it will never reach the C code. Why's
that okay but delivering an answer for "array_sort('{1}'::xid[])"
is not? I think "throw error only if we must sort and cannot" is
a perfectly fine definition.

At the code level, I didn't like the way that the multiple entry
points were set up. I think it's generally cleaner code to have
a worker function with plain C call and return coding and make
all the SQL-visible functions be wrappers around that. Also the
caching mechanism was overcomplicated, in particular because we
do not need a cache lookup to know which sort operators apply to
arrays.

So all that leads me to v18 attached. (I merged the two patches
into one, didn't see much value in splitting them.)

In v18, it's somewhat annoying that the typcache doesn't cache
the typarray field; we would not need a separate get_array_type()
lookup if it did. I doubt there is any real reason for that except
that pg_type.typarray didn't exist when the typcache was invented.
So I'm tempted to add it. But I looked at existing callers of
get_array_type() and none of them are adjacent to typcache lookups,
so only array_sort would be helped immediately. I left it alone
for the moment; wonder if anyone else has an opinion?

regards, tom lane

Attachment Content-Type Size
v18-general-purpose-array_sort.patch text/x-diff 17.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-03-30 22:05:57 Re: Non-text mode for pg_dumpall
Previous Message Gurjeet Singh 2025-03-30 20:46:48 Re: Thinko in pgstat_build_snapshot()