Re: Slow "not in array" operation

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-12 20:06:34
Message-ID: CAFvCgN4txnTT3Uev-YYOPaExPxX2xBYxq46BCohTUzWfWsmbgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
hyperthreads (16 cores) and SSD.
SHOW random_page_cost; => 2

2) What config names should I check exactly? I used some suggestions from
the online PGTune, when I first configured the db some months ago:
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16

3) Here's the query plan that I get after disabling the seq scan:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
time=94972.253..94972.254 rows=1 loops=1)

-> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
time=94952.895..95132.626 rows=8 loops=1)

Workers Planned: 7

Workers Launched: 7

-> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
width=8) (actual time=94950.958..94950.958 rows=1 loops=8)

-> Parallel Bitmap Heap Scan on subscriptions
(cost=50294.50..2180801.47 rows=854677 width=0) (actual
time=1831.342..94895.208 rows=611828 loops=8)

Recheck Cond: ((project_id = 123) AND (trashed_at IS
NULL))

Rows Removed by Index Recheck: 2217924

Filter: (NOT (tags @> '{en}'::character varying[]))

Rows Removed by Filter: 288545

Heap Blocks: exact=120301 lossy=134269

-> Bitmap Index Scan on
index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)

Index Cond: (project_id = 123)

Planning Time: 1.273 ms

Execution Time: 95132.766 ms

(15 rows)

On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> It is very interesting to me that the optimizer chose a parallel
> sequential scan rather than an index scan on either of your indexes that
> start with project_id that also reference trashed_at.
>
> 1) Are you running on SSD type storage? Has random_page_cost been lowered
> to 1-1.5 or so (close to 1 assumes good cache hits)?
> 2) It seems you have increased parallel workers. Have you also changed the
> startup or other cost configs related to how inclined the system is to use
> sequential scans?
> 3) If you disable sequential scan, what does the plan look like for this
> query? (SET ENABLE_SEQSCAN TO OFF;)
>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2019-11-12 20:31:06 Re: Slow "not in array" operation
Previous Message Justin Pryzby 2019-11-12 19:53:34 Re: Slow "not in array" operation