From: | Marco Colli <collimarco91(at)gmail(dot)com> |
---|---|
To: | Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Slow "not in array" operation |
Date: | 2019-11-12 20:06:34 |
Message-ID: | CAFvCgN4txnTT3Uev-YYOPaExPxX2xBYxq46BCohTUzWfWsmbgQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
hyperthreads (16 cores) and SSD.
SHOW random_page_cost; => 2
2) What config names should I check exactly? I used some suggestions from
the online PGTune, when I first configured the db some months ago:
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
3) Here's the query plan that I get after disabling the seq scan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
time=94972.253..94972.254 rows=1 loops=1)
-> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
time=94952.895..95132.626 rows=8 loops=1)
Workers Planned: 7
Workers Launched: 7
-> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
-> Parallel Bitmap Heap Scan on subscriptions
(cost=50294.50..2180801.47 rows=854677 width=0) (actual
time=1831.342..94895.208 rows=611828 loops=8)
Recheck Cond: ((project_id = 123) AND (trashed_at IS
NULL))
Rows Removed by Index Recheck: 2217924
Filter: (NOT (tags @> '{en}'::character varying[]))
Rows Removed by Filter: 288545
Heap Blocks: exact=120301 lossy=134269
-> Bitmap Index Scan on
index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
Index Cond: (project_id = 123)
Planning Time: 1.273 ms
Execution Time: 95132.766 ms
(15 rows)
On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:
> It is very interesting to me that the optimizer chose a parallel
> sequential scan rather than an index scan on either of your indexes that
> start with project_id that also reference trashed_at.
>
> 1) Are you running on SSD type storage? Has random_page_cost been lowered
> to 1-1.5 or so (close to 1 assumes good cache hits)?
> 2) It seems you have increased parallel workers. Have you also changed the
> startup or other cost configs related to how inclined the system is to use
> sequential scans?
> 3) If you disable sequential scan, what does the plan look like for this
> query? (SET ENABLE_SEQSCAN TO OFF;)
>
>>
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Lewis | 2019-11-12 20:31:06 | Re: Slow "not in array" operation |
Previous Message | Justin Pryzby | 2019-11-12 19:53:34 | Re: Slow "not in array" operation |