Re: Array of integer indexed nested-loop semi join

From: Mickael van der Beek <mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Array of integer indexed nested-loop semi join
Date: 2022-05-20 10:42:43
Message-ID: CACM-OydNMuQL99ewApsVeVhOqmiGoPPnCrSDyeCDsWmFpAVowA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Jeff,

Sorry for the delay, here are the EXPLAIN ANALYSE results for one single
row in the inner-query:

Nested Loop Semi Join (cost=10000000993.81..10004731160.70 rows=536206
> width=28) (actual time=93765.182..93765.183 rows=0 loops=1)
> Output: fu.w2_page_idxs
> Join Filter: (fu.w2_page_idxs && (ARRAY[fact_pages.idx]))
> Rows Removed by Join Filter: 53762825
> Buffers: shared hit=569194 read=2821768
> I/O Timings: read=56586.955
> -> Seq Scan on public.fact_users fu
> (cost=10000000000.00..10003925857.68 rows=53620568 width=28) (actual
> time=79.139..67423.779 rows=53762825 loops=1)
> Output: fu.w2_page_idxs
> Buffers: shared hit=567884 read=2821768
> I/O Timings: read=56586.955
> -> Materialize (cost=993.81..994.50 rows=1 width=32) (actual
> time=0.000..0.000 rows=1 loops=53762825)
> Output: (ARRAY[fact_pages.idx])
> Buffers: shared hit=148
> -> Limit (cost=993.81..994.48 rows=1 width=32) (actual
> time=26.382..26.383 rows=1 loops=1)
> Output: (ARRAY[fact_pages.idx])
> Buffers: shared hit=148
> -> Bitmap Heap Scan on public.fact_pages
> (cost=993.81..70645.00 rows=103556 width=32) (actual time=26.378..26.379
> rows=1 loops=1)
> Output: ARRAY[fact_pages.idx]
> Recheck Cond: (fact_pages.attribute_idxs &&
> '{300000160}'::integer[])
> Heap Blocks: exact=1
> Buffers: shared hit=148
> -> Bitmap Index Scan on fact_pages_attribute_idxs_int
> (cost=0.00..967.92 rows=103556 width=0) (actual time=14.865..14.865
> rows=101462 loops=1)
> Index Cond: (fact_pages.attribute_idxs &&
> '{300000160}'::integer[])
> Buffers: shared hit=147
> Query Identifier: 6779965332684941204
> Planning:
> Buffers: shared hit=2
> Planning Time: 0.162 ms
> JIT:
> Functions: 10
> Options: Inlining true, Optimization true, Expressions true, Deforming
> true
> Timing: Generation 1.507 ms, Inlining 9.797 ms, Optimization 54.902 ms,
> Emission 14.314 ms, Total 80.521 ms
> Execution Time: 93766.772 ms

Query:

EXPLAIN (
> ANALYZE,
> VERBOSE,
> COSTS,
> BUFFERS,
> TIMING
> )
> SELECT
> fu.w2_page_idxs
> FROM
> fact_users
> AS fu
> WHERE
> EXISTS (
> SELECT
> FROM
> (
> SELECT
> ARRAY[idx] AS page_idx
> FROM
> fact_pages
> WHERE
> attribute_idxs && ARRAY[300000160]
> FETCH FIRST 1 ROWS ONLY
> )
> AS fp
> WHERE
> fu.w2_page_idxs && fp.page_idx
> )
> ;

Without any surprises, the planner is using a sequential scan on the
"fact_users" table which is very large instead of using the GIN index set
on the "w2_page_idxs" column.

Link to the query plan visualiser: https://explain.dalibo.com/plan/1vC

Thank you very much in advance,

Mickael

On Wed, Apr 27, 2022 at 4:54 PM Mickael van der Beek <
mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> wrote:

> Hello Jeff,
>
> I have waited a few hours without the query ever finishing which is the
> reason I said "never finishes".
> Especially because the INNER JOIN version finishes within a few minutes
> while being combinatorial and less efficient.
> The query probably only does sequential scans.
>
> You will find the query plan using EXPLAIN here:
> - Visual query plan: https://explain.dalibo.com/plan#plan
> - Raw query plan: https://explain.dalibo.com/plan#raw
>
> Thanks for your help,
>
> Mickael
>
> On Wed, Apr 27, 2022 at 4:28 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Wed, Apr 27, 2022 at 8:19 AM Mickael van der Beek <
>> mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com> wrote:
>>
>>>
>>> The last query does not finish after waiting for more than 15 minutes.
>>> (The temporary view creation is very fast and required due to the same
>>> query in a CTE greatly reducing performance (by more than 5 min.) due to
>>> the optimisation barrier I'm guessing.)
>>>
>>
>> How much over 15 minutes? 20 minutes doesn't seem that long to wait to
>> get a likely definitive answer. But at the least show us the EXPLAIN
>> without ANALYZE of it, that should take no milliseconds.
>>
>> And what does it mean for something to take 5 minutes longer than "never
>> finishes"?
>>
>> (Also, putting every or every other token on a separate line does not
>> make it easier to read)
>>
>> Cheer,
>>
>> Jeff
>>
>>>
>
> --
> Mickael van der BeekWeb developer & Security analyst
>
> mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com
>

--
Mickael van der BeekWeb developer & Security analyst

mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message aditya desai 2022-05-20 15:40:37 Re: Selecting RAM and CPU based on max_connections
Previous Message Laurenz Albe 2022-05-20 10:37:17 Re: Need help on Query Tunning and Not using the Index Scan