Re: Slow "not in array" operation

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Marco Colli <collimarco91(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-13 11:30:10
Message-ID: CAMkU=1xwG6OZd7tzi3+jCmt=QhCX6KTj4027iN7xAT42nA=LmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Nov 13, 2019 at 4:20 AM Marco Colli <collimarco91(at)gmail(dot)com> wrote:

> Replying to the previous questions:
> - work_mem = 64MB (there are hundreds of connections)
> - the project 123 has more than 7M records, and those that don't have the
> tag 'en' are 4.8M
>
>
>> What was the plan for the one that took 500ms?
>
>
> This is the query / plan without the filter on tags:
>
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
> 123 AND "subscriptions"."trashed_at" IS NULL;
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=291342.67..291342.68 rows=1 width=8) (actual
> time=354.556..354.556 rows=1 loops=1)
> -> Gather (cost=291342.05..291342.66 rows=6 width=8) (actual
> time=354.495..374.305 rows=7 loops=1)
> Workers Planned: 6
> Workers Launched: 6
> -> Partial Aggregate (cost=290342.05..290342.06 rows=1 width=8)
> (actual time=349.799..349.799 rows=1 loops=7)
> -> Parallel Index Only Scan using
> index_subscriptions_on_project_id_and_uid on subscriptions
> (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018
> rows=1030593 loops=7)
> Index Cond: (project_id = 123)
> Heap Fetches: 280849
> Planning Time: 0.753 ms
> Execution Time: 374.483 ms
> (10 rows)
>

My previous comment about the bitmap index scan taking half the time was a
slip of the eye, I was comparing *cost* of the bitmap index scan to the
*time* of the overall plan. But then the question is, why isn't it doing
an index-only scan on "index_subscriptions_on_project_id_and_tags"? And
the answer is that is because it is a GIN index. Make the same index only
as btree, and you should get good performance as it can filter the tags
within a given project without visiting the table.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marco Colli 2019-11-13 11:56:22 Re: Slow "not in array" operation
Previous Message Rick Otten 2019-11-13 11:18:16 Re: Slow "not in array" operation