Re: Sudden insert performance degradation

From: Henrique Montenegro <typoon(at)gmail(dot)com>
To: Sebastian Dressler <sebastian(at)swarm64(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sudden insert performance degradation
Date: 2020-07-15 12:24:25
Message-ID: CAH_aqbtih-OGsgv6pAOD+kQauSZO0e5BF-R4eatAETC6eEYYAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jul 15, 2020 at 4:03 AM Sebastian Dressler <sebastian(at)swarm64(dot)com>
wrote:

> Hi Henrique,
>
> On 15. Jul 2020, at 03:13, Henrique Montenegro <typoon(at)gmail(dot)com> wrote:
> [...]
>
> ```
> ssl = off
> shared_buffers = 160GB # min 128kB
> work_mem = 96GB # min 64kB
> maintenance_work_mem = 12GB # min 1MB
> max_stack_depth = 4MB # min 100kB
> dynamic_shared_memory_type = posix # the default is the first option
> synchronous_commit = off # synchronization level;
> commit_delay = 100000 # range 0-100000, in microseconds
> max_wal_size = 3GB
> min_wal_size = 1GB
> min_parallel_index_scan_size = 64kB
> effective_cache_size = 96GB
> log_min_messages = debug1 # values in order of decreasing detail:
> log_checkpoints = on
> log_error_verbosity = verbose # terse, default, or verbose messages
> log_line_prefix = '%m [%p] %q%u(at)%d ' # special values:
> log_lock_waits = on # log lock waits >=
> deadlock_timeout
> log_timezone = 'America/New_York'
> log_executor_stats = on
> datestyle = 'iso, mdy'
> ```
>
> [...]
>
> Limit | Time (seconds)
> ---------|------------------
> 10 | 0.6
> 100 | 0.6
> 1000 | 1.3
> 10000 | 116.9
> 100000 | 134.8
> 1000000 | 193.2
>
> Notice the jump in time execution from a 1k limit to a 10k limit. Amount of
> data raised 10x and execution time raised 100x.
>
> It seems to me that inserting the data in this case is slow because the
> time
> it takes to identify the duplicate records (which I assume would be done
> in a
> fashion similiar to the queries above) is taking a long time.
>
> I have attached the `explain analyze` output for the 1k and 10k queries to
> this email (they are 4k+ lines each, didn't want to make this messager
> bigger
> than it already is).
>
> * exp1k.txt
> * exp10k.txt
>
> [...]
>
>
> I quickly glanced at the exp10k plan and there are some things I noticed
> (sorry for not going over all the mail, have to re-read it again):
>
> - There are a lot of partitions now, you maybe want consider reducing the
> amount. To me it seems that you overload the system. Scan times are low but
> the overhead to start a scan is likely quite high.
> - work_mem = 96GB seems very high to me, I guess you'd be better with e.g.
> 4GB as a start but many more parallel workers. For instance, depending on
> your machine, try adjusting the max_worker_processes, max_parallel_workers
> and max_parallel_workers_per_gather. Values depend a bit on your system,
> make sure, that max_parallel_workers_per_gather are much lower than
> max_parallel_workers and that must be lower than max_worker_processes. You
> can try large values, for instance 128, 120, 12.
> - You may want to test with min_parallel_table_scan_size = 0
> - Did you enable partition pruning, partitionwise join and aggregate?
>
> Thanks,
> Sebastian
>
> --
>
> Sebastian Dressler, Solution Architect
> +49 30 994 0496 72 | sebastian(at)swarm64(dot)com
>
> Swarm64 AS
> Parkveien 41 B | 0258 Oslo | Norway
> Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
> CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender
> (Styrets Leder): Dr. Sverre Munck
>
> Swarm64 AS Zweigstelle Hive
> Ullsteinstr. 120 | 12109 Berlin | Germany
> Registered at Amtsgericht Charlottenburg - HRB 154382 B
>
>
Hi Sebastian,

That is a good idea about the parallel workers. I have tried to update them
and will post the results as soon as I have them.
Regarding the partition pruning it is set to the default (which is on).
partitionwise_join and partitionwise_aggregate are both set to off. I will
turn them on as well and see how it goes.

Thanks for the suggestions! I will keep the list updated.

Henrique

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Henrique Montenegro 2020-07-15 18:49:16 Re: Sudden insert performance degradation
Previous Message Sebastian Dressler 2020-07-15 08:03:46 Re: Sudden insert performance degradation