Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

From: Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)
Date: 2021-04-15 03:43:59
Message-ID: CA+CZih5_FdgoS5chECRQ7EkQM0WJ-faQyM8D8+50aRWWhiu9zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yeah, that was a plan for a query before its simplification. But effect is
still the same, and also the question is still the same - why a bitmap scan
is preferred over a number of individual index scans with fetching first 50
elements from each. (Also, replacing LIMIT 50 to LIMIT 2 doesn't seem to
change anything, although having 2 here should logically make it prefer 8
index scans with selecting 2 first elements from each over selecting 186051
rows in one bitmap index scan.)

Here is the plan for the exact query with LIMIT=2:

CREATE TABLE roles(
id bigint NOT NULL,
id1 bigint,
created_at timestamptz NOT NULL
);
CREATE INDEX roles_asset_created_desc ON roles(id1, created_at DESC);

EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
'1001361878439251615', '1001349402553202617', '1001329448424677858',
'1001348457743394950', '1001361706624116300', '1001338330225145648',
'1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 2;

Limit (cost=49712.75..49712.99 rows=2 width=42) (actual
time=82.611..83.462 rows=2 loops=1)
-> Gather Merge (cost=49712.75..67421.89 rows=151782 width=42) (actual
time=82.611..83.459 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=48712.73..48902.46 rows=75891 width=42) (actual
time=70.404..70.404 rows=2 loops=3)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Bitmap Heap Scan on roles
(cost=4266.99..47953.82 rows=75891 width=42) (actual time=7.854..57.664
rows=61255 loops=3)
Recheck Cond: (id1 = ANY
('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Heap Blocks: exact=6886
-> Bitmap Index Scan on roles_asset_created_desc
(cost=0.00..4221.46 rows=182139 width=0) (actual time=14.031..14.031
rows=186051 loops=1)
Index Cond: (id1 = ANY
('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Planning Time: 0.074 ms
Execution Time: 83.491 ms

On Wed, Apr 14, 2021 at 5:41 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> Your query and explain analyze output do not seem to match.
>
> Filter: (cred_id = '1001344096118566254'::bigint)
>
> I don't see anything like that in your query, nor an index that would
> support accomplishing that without filtering after fetching the 184k rows
> initially like the planner does.
>
>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2021-04-15 10:34:06 Re: Vulnerability PostgreSQL 11.2
Previous Message Michael Lewis 2021-04-15 00:41:18 Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)