Re: Sudden insert performance degradation

From: Henrique Montenegro <typoon(at)gmail(dot)com>
To: Sebastian Dressler <sebastian(at)swarm64(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sudden insert performance degradation
Date: 2020-07-13 17:02:09
Message-ID: CAH_aqbt2XAKQTeZePT54mypBy2VcUN9Y-TAHrhOTkHqjOMhtPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 13, 2020 at 12:50 PM Sebastian Dressler <sebastian(at)swarm64(dot)com>
wrote:

> Hi Henrique,
>
> On 13. Jul 2020, at 18:42, Henrique Montenegro <typoon(at)gmail(dot)com> wrote:
>
> On Mon, Jul 13, 2020 at 11:20 AM Sebastian Dressler <sebastian(at)swarm64(dot)com>
> wrote:
>
>
>> Running the above loop worked fine for about 12 hours. Each file was
>> taking
>> about 30 seconds to be processed. About 4 seconds to create the `users`
>> table
>> and have the CSV data loaded into it and anything between 20 and 30
>> seconds to
>> insert the data from `users` into `users_no_dups`.
>>
>>
>> Do you see anything suspicious in the logs, i.e. something in the realms
>> of running out of transaction IDs?
>>
>
> I set the log to debug1. I haven't seen anything that called my attention,
> but I am not really sure what to look for, so perhaps I missed it. Any
> suggestions on what to look for or any specific log configuration to do?
>
>
> Not necessarily, if you'd run out of tx IDs you would notice that cleary,
> I guess. I also think that this is not the issue.
>
>
>
>>
>> [...]
>>
>>
>> Recreating the table now isn't really providing any improvements. I tried
>> recreating it with a `fillfactor` of `10`, but it was taking too long and
>> too
>> much space (the table had 300GB with the fillfactor set to 30; with it
>> set to
>> 10 it went up to almost 1TB).
>>
>>
>> To me it sounds like the UK/PK is getting too much to write. A possible
>> solution could be to start partitioning the table.
>>
>
> I thought about partitioning it, but I can't figure out on what. The
> `user_id` column is a number that is somewhat random so I don't know what
> kinds of range I would use for it. I will try to look at the values again
> and see if there is something that I could perhaps use as a range. Any
> other suggestions?
>
>
> Depending on granularity, maybe partition on `created_ts`?
>

I could give it a try. The reason I didn't try that yet was that I thought
that since the UK is on the `user_id` column it wouldn't give me any
benefit, but I can't really justify why I was thinking that. I would assume
that the constraint would be validated against the index and not the whole
table, so this might work. I will give it a try.

Thanks!

Henrique

>
> Cheers,
> Sebastian
>
> --
>
> Sebastian Dressler, Solution Architect
> +49 30 994 0496 72 | sebastian(at)swarm64(dot)com
>
> Swarm64 AS
> Parkveien 41 B | 0258 Oslo | Norway
> Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
> CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender
> (Styrets Leder): Dr. Sverre Munck
>
> Swarm64 AS Zweigstelle Hive
> Ullsteinstr. 120 | 12109 Berlin | Germany
> Registered at Amtsgericht Charlottenburg - HRB 154382 B
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2020-07-14 00:05:07 Re: Sudden insert performance degradation
Previous Message Sebastian Dressler 2020-07-13 16:50:55 Re: Sudden insert performance degradation