From: | Shanti-Dominique <shanti(at)mildred(dot)fr> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Efficient rows filter for array inclusion with gin index |
Date: | 2024-02-28 15:49:40 |
Message-ID: | f2dd527d-89d0-4bc0-a0a5-52b41f109c4f@mildred.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Replying to myself after more investigation.
On 28/02/2024 12:05, Shanti-Dominique wrote:
>
>
> 2)
> SELECT *
> FROM items i1
> JOIN item_paths p1 ON i1.ref_id = p1.ref_id
> JOIN items i2 ON ARRAY[i2.ref_id] <@ p1.item_path
> WHERE ...
>
> 2) uses the operator <@ which is supported by the gin index, the test
> for inclusion is fast and the query does not run a sequential scan
> over the whole "item_paths" table. However, because of the
> ARRAY[i2.ref_id] construct, it performs a sequential scan on i2.
I was under the assumption that the ARRAY[] construct prevented
postgresql from efficiently using the index on the other side of the
operator, but I think I was mistaken. On a database full of data, I
tried getting around this but did not see any improvement of performance.
First I tried to add an index on the single element array:
CREATE FUNCTION uuidarr(ref_id uuid) RETURNS uuid[]
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN ARRAY[ref_id];
CREATE INDEX items_ref_id_arr2_idx ON items USING gin (uuidarr(ref_id));
EXPLAIN
SELECT i2.*
FROM items i1
JOIN item_paths p1 ON i1.ref_id = p1.ref_id
JOIN items i2 ON uuidarr(i2.ref_id) <@ p1.item_path
WHERE i1.name = 'a';
The performance was even worse. Then I tried with a generated column:
CREATE TABLE items (
ref_id uuid DEFAULT public.gen_random_uuid() NOT NULL,
ref_id_array uuid[] GENERATED ALWAYS AS (uuidarr(ref_id)) STORED,
name character varying,
parent_ref_id uuid
);
CREATE INDEX items_ref_id_array_idx ON items USING gin (ref_id_array);
EXPLAIN
SELECT i2.*
FROM items i1
JOIN item_paths p1 ON i1.ref_id = p1.ref_id
JOIN items i2 ON i2.ref_id_array <@ p1.item_path
WHERE i1.name = 'a';
The performance was very similar to the query with ARRAY[...]
It seems there is no good solution for the general case, apart from
changing the structure of my dataset and removing the use of arrays
entirely.
I think I'll update my codebase and use <@ where it makes sense and =
ANY in other places, but it'll be difficult to know for sure without
running the query which one will be better.
From | Date | Subject | |
---|---|---|---|
Next Message | Sasmit Utkarsh | 2024-02-28 15:53:50 | Need Assistance: Command to display procedures does not work |
Previous Message | Jason Long | 2024-02-28 15:43:32 | Re: PostgreSQL Read-only mode usage |