Re: Sudden insert performance degradation

From: Henrique Montenegro <typoon(at)gmail(dot)com>
To: Sebastian Dressler <sebastian(at)swarm64(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sudden insert performance degradation
Date: 2020-07-13 16:42:37
Message-ID: CAH_aqbuGg7QcRsWjs6fog20RqoF4XG6WdnH+BTp2dzhMUg-KSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 13, 2020 at 11:20 AM Sebastian Dressler <sebastian(at)swarm64(dot)com>
wrote:

> Hi Henrique,
>
> On 13. Jul 2020, at 16:23, Henrique Montenegro <typoon(at)gmail(dot)com> wrote:
>
> [...]
>
> * Insert the data from the `users` table into the `users_no_dups` table
>
> ```
> insert into users_no_dups (
> created_ts,
> user_id,
> name,
> url
> ) (
> select
> created_ts,
> user_id,
> name,
> url
> from
> users
> ) on conflict do nothing
> ```
>
>
> How do you check contraints here? Is this enforced with UK/PK?
>

The Unique Key is supposed to to the constraint enforcing here. The `users`
table will have data that is duplicate and the maximum number of records on
it is 1 million. Then I just try to insert it into the `users_no_dups`
table with the `on conflict do nothing` to ignore the duplicates and
discard them.

> Running the above loop worked fine for about 12 hours. Each file was taking
> about 30 seconds to be processed. About 4 seconds to create the `users`
> table
> and have the CSV data loaded into it and anything between 20 and 30
> seconds to
> insert the data from `users` into `users_no_dups`.
>
>
> Do you see anything suspicious in the logs, i.e. something in the realms
> of running out of transaction IDs?
>

I set the log to debug1. I haven't seen anything that called my attention,
but I am not really sure what to look for, so perhaps I missed it. Any
suggestions on what to look for or any specific log configuration to do?

>
> [...]
>
>
> Recreating the table now isn't really providing any improvements. I tried
> recreating it with a `fillfactor` of `10`, but it was taking too long and
> too
> much space (the table had 300GB with the fillfactor set to 30; with it set
> to
> 10 it went up to almost 1TB).
>
>
> To me it sounds like the UK/PK is getting too much to write. A possible
> solution could be to start partitioning the table.
>

I thought about partitioning it, but I can't figure out on what. The
`user_id` column is a number that is somewhat random so I don't know what
kinds of range I would use for it. I will try to look at the values again
and see if there is something that I could perhaps use as a range. Any
other suggestions?

>
> [...]
> ```
> ssl = off
> shared_buffers = 8GB
> work_mem = 12GB
> maintenance_work_mem = 12GB
> max_stack_depth = 4MB
> synchronous_commit = off
> wal_writer_flush_after = 128MB
> max_wal_size = 32GB
> min_wal_size = 80MB
> effective_cache_size = 96GB
> ```
>
>
> Another suggestion would be to increase the min_wal_size here, but since
> you use UNLOGGED tables it does not matter much.
>
>
> Information about the machine:
>
> ```
> Processor: 2x Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz (6 cores each, 12
> threads)
> RAM: 256GB
>
>
> Disk1: 2TB SSD SATA-3 Samsung Evo 860
> Disk2: 6TB HDD SATA-3 Seagate Exos Enterprise 7200RPM
> Disk3: 8TB HDD SATA-3 Seagate Exos Enterprise 7200RPM
>
> Disk1 and Disk2 are configured as a single logical volume.
>
>
> Just curious: does that mean you mix up SSD + HDD?
>

Yeah, I did that. Probably not very smart of me. I plan on undoing it soon.
I assumed that is not what is causing my issue since the tablespace where
the table is stored is on `Disk3` which is not part of the Logical Volume.

>
> Cheers,
> Sebastian
>
>
> --
>
> Sebastian Dressler, Solution Architect
> +49 30 994 0496 72 | sebastian(at)swarm64(dot)com
>
> Swarm64 AS
> Parkveien 41 B | 0258 Oslo | Norway
> Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
> CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender
> (Styrets Leder): Dr. Sverre Munck
>
> Swarm64 AS Zweigstelle Hive
> Ullsteinstr. 120 | 12109 Berlin | Germany
> Registered at Amtsgericht Charlottenburg - HRB 154382 B
>
>
Thanks!

Henrique

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Henrique Montenegro 2020-07-13 16:48:53 Re: Sudden insert performance degradation
Previous Message Michael Lewis 2020-07-13 16:27:39 Re: Sudden insert performance degradation