Quick Links

Re: Efficient rows filter for array inclusion with gin index

From:	Shanti-Dominique <shanti(at)mildred(dot)fr>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Efficient rows filter for array inclusion with gin index
Date:	2024-02-28 15:49:40
Message-ID:	f2dd527d-89d0-4bc0-a0a5-52b41f109c4f@mildred.fr
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Replying to myself after more investigation.

On 28/02/2024 12:05, Shanti-Dominique wrote:

>
>
> 2)
>     SELECT *
>     FROM    items i1
>             JOIN item_paths p1 ON i1.ref_id = p1.ref_id
>             JOIN items i2 ON ARRAY[i2.ref_id] <@ p1.item_path
>     WHERE   ...
>
> 2) uses the operator <@ which is supported by the gin index, the test
> for inclusion is fast and the query does not run a sequential scan
> over the whole "item_paths" table. However, because of the
> ARRAY[i2.ref_id] construct, it performs a sequential scan on i2.
I was under the assumption that the ARRAY[] construct prevented
postgresql from efficiently using the index on the other side of the
operator, but I think I was mistaken. On a database full of data, I
tried getting around this but did not see any improvement of performance.

First I tried to add an index on the single element array:

CREATE FUNCTION uuidarr(ref_id uuid) RETURNS uuid[]
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN ARRAY[ref_id];

CREATE INDEX items_ref_id_arr2_idx ON items USING gin (uuidarr(ref_id));

EXPLAIN
SELECT i2.*
FROM    items i1
        JOIN item_paths p1 ON i1.ref_id = p1.ref_id
        JOIN items i2 ON uuidarr(i2.ref_id) <@ p1.item_path
WHERE   i1.name = 'a';

The performance was even worse. Then I tried with a generated column:

CREATE TABLE items (
    ref_id uuid DEFAULT public.gen_random_uuid() NOT NULL,
    ref_id_array uuid[] GENERATED ALWAYS AS (uuidarr(ref_id)) STORED,
    name character varying,
    parent_ref_id uuid
);

CREATE INDEX items_ref_id_array_idx ON items USING gin (ref_id_array);

EXPLAIN
SELECT i2.*
FROM    items i1
        JOIN item_paths p1 ON i1.ref_id = p1.ref_id
        JOIN items i2 ON i2.ref_id_array <@ p1.item_path
WHERE   i1.name = 'a';

The performance was very similar to the query with ARRAY[...]

It seems there is no good solution for the general case, apart from
changing the structure of my dataset and removing the use of arrays
entirely.

I think I'll update my codebase and use <@ where it makes sense and =
ANY in other places, but it'll be difficult to know for sure without
running the query which one will be better.

In response to

Efficient rows filter for array inclusion with gin index at 2024-02-28 11:05:27 from Shanti-Dominique

Browse pgsql-general by date

	From	Date	Subject
Next Message	Sasmit Utkarsh	2024-02-28 15:53:50	Need Assistance: Command to display procedures does not work
Previous Message	Jason Long	2024-02-28 15:43:32	Re: PostgreSQL Read-only mode usage