Re: Slow "not in array" operation

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Marco Colli <collimarco91(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-12 20:31:06
Message-ID: CAHOFxGox3eZvjP8WLbif7-f7oeJ430PfmU8rMr8+14X159Xk3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Odd index choice by the optimizer given what is available. The bitmap being
lossy means more work_mem is needed if I remember properly.

It is interesting that skipping the where condition on the array is only
half a second. Is the array being toasted or is it small and being stored
in the same file as primary table?

What is the result for this count query? Is it roughly 4 million?

On Tue, Nov 12, 2019, 1:06 PM Marco Colli <collimarco91(at)gmail(dot)com> wrote:

> 1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
> hyperthreads (16 cores) and SSD.
> SHOW random_page_cost; => 2
>
> 2) What config names should I check exactly? I used some suggestions from
> the online PGTune, when I first configured the db some months ago:
> max_worker_processes = 16
> max_parallel_workers_per_gather = 8
> max_parallel_workers = 16
>
> 3) Here's the query plan that I get after disabling the seq scan:
>
>
> QUERY PLAN
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> time=94972.253..94972.254 rows=1 loops=1)
>
> -> Gather (cost=2183938.16..2183938.87 rows=7 width=8) (actual
> time=94952.895..95132.626 rows=8 loops=1)
>
> Workers Planned: 7
>
> Workers Launched: 7
>
> -> Partial Aggregate (cost=2182938.16..2182938.17 rows=1
> width=8) (actual time=94950.958..94950.958 rows=1 loops=8)
>
> -> Parallel Bitmap Heap Scan on subscriptions
> (cost=50294.50..2180801.47 rows=854677 width=0) (actual
> time=1831.342..94895.208 rows=611828 loops=8)
>
> Recheck Cond: ((project_id = 123) AND (trashed_at IS
> NULL))
>
> Rows Removed by Index Recheck: 2217924
>
> Filter: (NOT (tags @> '{en}'::character varying[]))
>
> Rows Removed by Filter: 288545
>
> Heap Blocks: exact=120301 lossy=134269
>
> -> Bitmap Index Scan on
> index_subscriptions_on_project_id_and_tags (cost=0.00..48798.81
> rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)
>
> Index Cond: (project_id = 123)
>
> Planning Time: 1.273 ms
>
> Execution Time: 95132.766 ms
>
> (15 rows)
>
>
> On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:
>
>> It is very interesting to me that the optimizer chose a parallel
>> sequential scan rather than an index scan on either of your indexes that
>> start with project_id that also reference trashed_at.
>>
>> 1) Are you running on SSD type storage? Has random_page_cost been lowered
>> to 1-1.5 or so (close to 1 assumes good cache hits)?
>> 2) It seems you have increased parallel workers. Have you also changed
>> the startup or other cost configs related to how inclined the system is to
>> use sequential scans?
>> 3) If you disable sequential scan, what does the plan look like for this
>> query? (SET ENABLE_SEQSCAN TO OFF;)
>>
>>>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2019-11-12 20:50:51 Re: Slow "not in array" operation
Previous Message Marco Colli 2019-11-12 20:06:34 Re: Slow "not in array" operation