Re: Slow "not in array" operation

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-12 19:04:15
Message-ID: CAFvCgN6bQhf03jdakdk1eVMad8r=6nFV=cmy2v_sp0f8_kZ-bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

To be honest, I have simplified the question above. In order to show you
the plan, I must show you the actual query, which is this:

=== QUERY ===

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @>
ARRAY['en']::varchar[]);

=== QUERY PLAN ===

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=2152593.04..2152593.05 rows=1 width=8) (actual
time=70555.561..70555.561 rows=1 loops=1)

-> Gather (cost=2152592.31..2152593.02 rows=7 width=8) (actual
time=70540.641..70702.365 rows=8 loops=1)

Workers Planned: 7

Workers Launched: 7

-> Partial Aggregate (cost=2151592.31..2151592.32 rows=1
width=8) (actual time=70537.376..70537.377 rows=1 loops=8)

-> Parallel Seq Scan on subscriptions
(cost=0.00..2149490.49 rows=840731 width=0) (actual time=0.742..70479.359
rows=611828 loops=8)

Filter: ((trashed_at IS NULL) AND (NOT (tags @>
'{en}'::character varying[])) AND (project_id = 123))

Rows Removed by Filter: 4572769

Planning Time: 1.304 ms

Execution Time: 70702.463 ms

(10 rows)

=== INDEXES ===

Indexes:

"subscriptions_pkey" PRIMARY KEY, btree (id)

"index_subscriptions_on_project_id_and_created_at" btree (project_id,
created_at DESC)

"index_subscriptions_on_project_id_and_tags" gin (project_id, tags)
WHERE trashed_at IS NULL

"index_subscriptions_on_project_id_and_trashed_at" btree (project_id,
trashed_at DESC)

=== NOTES ===

Running the query without the last filter on tags takes only 500ms.
Unfortunately I cannot make strict assumptions on data or tags: for example
I also have to count subscriptions in a project that don't have tag A and
don't have tag B, etc. This means that I cannot simply calculate the total
and then make a subtraction.

On Tue, Nov 12, 2019 at 7:40 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> What's the plan for the slow one? What's the time to just count all rows?
>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2019-11-12 19:20:10 Re: Slow "not in array" operation
Previous Message Michael Lewis 2019-11-12 18:39:46 Re: Slow "not in array" operation