From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Mickael van der Beek <mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> |
Cc: | Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Array of integer indexed nested-loop semi join |
Date: | 2022-05-23 14:10:48 |
Message-ID: | CAMkU=1ycu-0Dj-iYFuD8rZtmfoiQr1bdgdixCcTY65s6=qUCWQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, May 23, 2022 at 3:57 AM Mickael van der Beek <
mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> wrote:
> Hello Jeff,
>
> Sadly, the query you suggested won't work because you are only returning
> the first row of the matching inner query rows.
>
Sure, but the query I replaced did the same thing. (I thought that was
what you wanted, but I guess that was just to get it to run fast enough to
ever finish--in that case it is probably better to use EXPLAIN without the
ANALYZE so that we can see the plan of the correct query). To get around
that one-row limit you have to write it somewhat differently, getting rid
of the ARRAY and adding an array_agg():
SELECT fu.*
FROM
fact_users AS fu
WHERE
fu.w2_page_idxs && (select array_agg(idx) from fact_pages where
attribute_idxs && ARRAY[201]);
This way of writing it is better, as it still works with the LIMIT 1 but
also works without it. This still uses the indexes for me, at least when
enable_seqscan is off.
> The INNER JOIN version of the query will return all matching rows but also
> include duplicates:
>
You could just add a DISTINCT to get rid of the duplicates. Of course that
will also take some time on a large returned data set, but probably less
time than scanning a giant table. I think this is probably cleaner than
the alternatives.
>
> The reason I'm using integer arrays is because it is the only way I have
> found in PostgreSQL to get fast inclusion / exclusion checks on large
> datasets (hundreds of millions of values).
> Did I misunderstand your response?
>
I don't know if you misunderstood. I meant specifically the intarray
extension. You can use integer arrays with built-in GIN indexes without
help from the intarray extension. Maybe you know that already and are just
saying that the extension is even faster than the built-in indexed
operators are and you need that extra speed.
Cheers,
Jeff
>
From | Date | Subject | |
---|---|---|---|
Next Message | Mickael van der Beek | 2022-05-23 14:46:41 | Re: Array of integer indexed nested-loop semi join |
Previous Message | Mickael van der Beek | 2022-05-23 07:57:25 | Re: Array of integer indexed nested-loop semi join |