| From: | Mickael van der Beek <mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> | 
|---|---|
| To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> | 
| Cc: | Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Array of integer indexed nested-loop semi join | 
| Date: | 2022-05-23 07:57:25 | 
| Message-ID: | CACM-Oyd=z=qeDnFmp5PBW4+tHecggJie4KcW=L9uS68PCFFu4g@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
Hello Jeff,
Sadly, the query you suggested won't work because you are only returning
the first row of the matching inner query rows.
Example:
SELECT
>   u.idx,
>   u.page_idxs
> FROM
>   (
>     VALUES
>       (1, ARRAY[11, 21, 31]),
>       (2, ARRAY[12, 21, 32]),
>       (3, ARRAY[13, 23, 31])
>   )
>     AS u(idx, page_idxs)
> WHERE
>   u.page_idxs && ARRAY[(
>     SELECT
>       p.idx
>     FROM
>       (
>         VALUES
>           (11, ARRAY[101, 201, 301]),
>           (21, ARRAY[102, 201, 302]),
>           (13, ARRAY[103, 203, 301])
>       )
>         AS p(idx, attribute_idxs)
>     WHERE
>       p.attribute_idxs && ARRAY[201]
>     FETCH FIRST 1 ROWS ONLY
>   )]
> ;
This query only returns one row while it should actually return two:
1 {11,21,31}
The INNER JOIN version of the query will return all matching rows but also
include duplicates:
SELECT
>   u.idx,
>   u.page_idxs
> FROM
>   (
>     VALUES
>       (1, ARRAY[11, 21, 31]),
>       (2, ARRAY[12, 21, 32]),
>       (3, ARRAY[13, 23, 31])
>   )
>     AS u(idx, page_idxs)
> INNER JOIN
>   (
>     SELECT
>       p.idx
>     FROM
>       (
>         VALUES
>           (11, ARRAY[101, 201, 301]),
>           (21, ARRAY[102, 201, 302]),
>           (13, ARRAY[103, 203, 301])
>       )
>         AS p(idx, attribute_idxs)
>     WHERE
>       p.attribute_idxs && ARRAY[201]
>   )
>   AS p2
>   ON u.page_idxs && ARRAY[p2.idx]
> ;
Results:
1 {11,21,31}
> 1 {11,21,31}
> 2 {12,21,32}
As far as I know, the the IN + sub-expression query can't work since the
left side of the operation is an array of integers and the right side a set
of rows with a single integer column.
The reason I'm using integer arrays is because it is the only way I have
found in PostgreSQL to get fast inclusion / exclusion checks on large
datasets (hundreds of millions of values).
Did I misunderstand your response?
Thank you for the ongoing help,
Mickael
On Mon, May 23, 2022 at 4:45 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>
> On Fri, May 20, 2022 at 6:42 AM Mickael van der Beek <
> mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> wrote:
>
>>
>> Query:
>>
>> EXPLAIN (
>>>   ANALYZE,
>>>   VERBOSE,
>>>   COSTS,
>>>   BUFFERS,
>>>   TIMING
>>> )
>>> SELECT
>>>   fu.w2_page_idxs
>>> FROM
>>>   fact_users
>>>     AS fu
>>> WHERE
>>>   EXISTS (
>>>     SELECT
>>>     FROM
>>>       (
>>>         SELECT
>>>           ARRAY[idx] AS page_idx
>>>         FROM
>>>           fact_pages
>>>         WHERE
>>>           attribute_idxs && ARRAY[300000160]
>>>         FETCH FIRST 1 ROWS ONLY
>>>       )
>>>         AS fp
>>>     WHERE
>>>       fu.w2_page_idxs && fp.page_idx
>>>   )
>>> ;
>>
>>
>> Without any surprises, the planner is using a sequential scan on the
>> "fact_users" table which is very large instead of using the GIN index set
>> on the "w2_page_idxs" column.
>>
>
> For me, using the subquery in and expression, instead of the EXISTS, does
> get it to use the gin index.  And I think it must give the same results.
>
> SELECT
>   fu.w2_page_idxs
> FROM  fact_users AS fu
> WHERE
>       fu.w2_page_idxs && ARRAY[(select idx from fact_pages where
> attribute_idxs && ARRAY[3003] FETCH FIRST 1 ROWS ONLY)];
>
> But why are you using intarray?  That is unnecessary here, and by creating
> ambiguity about the array operators it might be harmful.
>
> Cheers,
>
> Jeff
>
>>
-- 
Mickael van der BeekWeb developer & Security analyst
mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Janes | 2022-05-23 14:10:48 | Re: Array of integer indexed nested-loop semi join | 
| Previous Message | Jeff Janes | 2022-05-23 02:45:14 | Re: Array of integer indexed nested-loop semi join |