Re: Sort functions with specialized comparators

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Антуан Виолин <violin(dot)antuan(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sort functions with specialized comparators
Date: 2025-01-07 04:43:19
Message-ID: CANWCAZa+O3QiNKW1YYTOusHL-vEFcSz2VH+-qPv1EfvCpc1b9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 6, 2025 at 10:51 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> > On 6 Jan 2025, at 15:54, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:

> > argument. Like some other patches in this series, this does have the
> > side effect of removing the ability to skip quinique(), so that should
> > be benchmarked (I can do that this week unless you beat me to it).
>
> With the same setup as in the first message of this thread we can do:
>
> postgres=# SELECT _int_contains(arr,ARRAY[1]) FROM arrays_to_sort;
>
> before patch patch
> Time: 567.928 ms
> after patch
> Time: 890.297 ms
> timing of this function is dominated by PREPAREARR(a);
>
> What bothers me is that PREPAREARR(a); is returning new array in case of empty input. That's why I considered little refactoring of resize_intArrayType(): reorder cases so that if (num == ARRNELEMS(a)) was first.

Hmm, I'm confused. First, none of the arrays are empty that I can see
-- am I missing something?

Then, the first message setup is

CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

...so most of the time is in sorting the big array, and I don't see a
regression, the opposite in fact:

SELECT _int_contains(arr,ARRAY[1]) FROM arrays_to_sort;

master:
1492.552 ms

v9:
873.697 ms

The case I was concerned about was if the big array was already sorted
and unique. Then, it's conceivable that unnecessarily running qunique
would make things slower, but I don't see that either:

--ordered
CREATE TABLE arrays_sorted AS
SELECT a arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

SELECT _int_contains(arr,ARRAY[1]) FROM arrays_sorted;

master:
31.388

v9:
28.247

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2025-01-07 04:55:50 Re: Conflict detection for update_deleted in logical replication
Previous Message Tom Lane 2025-01-07 03:29:07 Re: allow changing autovacuum_max_workers without restarting