Re: Slow "not in array" operation

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-13 09:20:02
Message-ID: CAFvCgN6d0cTQuB_Ye4HHyh61ngWR3UR0O-kWG7CnEbrU38X8-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the
tag 'en' are 4.8M

> What was the plan for the one that took 500ms?

This is the query / plan without the filter on tags:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123 AND "subscriptions"."trashed_at" IS NULL;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=291342.67..291342.68 rows=1 width=8) (actual
time=354.556..354.556 rows=1 loops=1)
-> Gather (cost=291342.05..291342.66 rows=6 width=8) (actual
time=354.495..374.305 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
-> Partial Aggregate (cost=290342.05..290342.06 rows=1 width=8)
(actual time=349.799..349.799 rows=1 loops=7)
-> Parallel Index Only Scan using
index_subscriptions_on_project_id_and_uid on subscriptions
(cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018
rows=1030593 loops=7)
Index Cond: (project_id = 123)
Heap Fetches: 280849
Planning Time: 0.753 ms
Execution Time: 374.483 ms
(10 rows)

Then if I simply add the exclusion of a single tag, it goes from a few
milliseconds to 70s...

On Wed, Nov 13, 2019 at 12:33 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>
>> 3) Here's the query plan that I get after disabling the seq scan:
>>
>>
>> QUERY PLAN
>>
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
>> time=94972.253..94972.254 rows=1 loops=1)
>>
>> -> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
>> time=94952.895..95132.626 rows=8 loops=1)
>>
>> Workers Planned: 7
>>
>> Workers Launched: 7
>>
>> -> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
>> width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
>>
>> -> Parallel Bitmap Heap Scan on subscriptions
>> (cost=50294.50..2180801.47 rows=854677 width=0) (actual
>> time=1831.342..94895.208 rows=611828 loops=8)
>>
>> Recheck Cond: ((project_id = 123) AND (trashed_at IS
>> NULL))
>>
>> Rows Removed by Index Recheck: 2217924
>>
>> Filter: (NOT (tags @> '{en}'::character varying[]))
>>
>> Rows Removed by Filter: 288545
>>
>> Heap Blocks: exact=120301 lossy=134269
>>
>> -> Bitmap Index Scan on
>> index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
>> rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
>>
>> Index Cond: (project_id = 123)
>>
>> Planning Time: 1.273 ms
>>
>> Execution Time: 95132.766 ms
>>
>> (15 rows)
>>
>
> What was the plan for the one that took 500ms? I don't see how it is
> possible that this one is 180 times slower than that one. Maybe a hot
> cache versus cold cache? Also, it seems weird to me that "trashed_at IS
> NULL" shows up in the recheck but not in the original Index Cond.
> Increasing work_mem can also help, but since the Bitmap Index Scan itself
> took half the time there is only so much it can do.
>
> Cheers,
>
> Jeff
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Morris de Oryx 2019-11-13 10:46:10 Re: Slow "not in array" operation
Previous Message Jeff Janes 2019-11-12 23:33:28 Re: Slow "not in array" operation