Re: Sudden insert performance degradation

From: Henrique Montenegro <typoon(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sudden insert performance degradation
Date: 2020-07-14 01:02:14
Message-ID: CAH_aqbsPRw82X-=wT8irNLPs6j_qFrrABiq2ERw=qOYmskxGFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 13, 2020 at 8:05 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Jul 13, 2020 at 10:23 AM Henrique Montenegro <typoon(at)gmail(dot)com>
> wrote:
>
> insert into users_no_dups (
>> created_ts,
>> user_id,
>> name,
>> url
>> ) (
>> select
>> created_ts,
>> user_id,
>> name,
>> url
>> from
>> users
>> ) on conflict do nothing
>>
>
> Once the size of the only index exceeds shared_buffers by a bit (the
> amount of "a bit" depends on your RAM, kernel version, settings
> for dirty_background_ratio, dirty_expire_centisecs, and probably other
> things, and is not easy to predict) the performance falls off a cliff when
> inserting values in a random order. Every insert dirties a random index
> leaf page, which quickly gets evicted from shared_buffers to make room for
> other random leaf pages to be read in, and then turns into flush calls when
> the kernel freaks out about the amount and age of dirty pages held in
> memory.
>

That is interesting to know. I will do some research on those things.

> What happens if you add an "ORDER BY user_id" to your above select?
>

I don't know. I will give it a try right now.

>
>
>> shared_buffers = 8GB
>> RAM: 256GB
>>
>
> Or, crank up shared_buffers by a lot. Like, beyond the size of the
> growing index, or up to 240GB if the index ever becomes larger than that.
> And make the time between checkpoints longer. If the dirty buffers are
> retained in shared_buffers longer, chances of them getting dirtied
> repeatedly between writes is much higher than if you just toss them to the
> kernel and hope for the best.
>
>
I cranked it up to 160GB to see how it goes.

Cheers,
>
> Jeff
>

I created the partitions as well as mentioned before. I was able to
partition the table based on the user_id (found some logic to it). I was
transferring the data from the original table (about 280 million records;
320GB) to the new partitioned table and things were going well with write
speeds between 30MB/s and 50MB/s. After reading 270GB of the 320GB (in 4
and a half hours) and writing it to the new partitioned table, write speed
went down to 7KB/s. It is so frustrating.

I will keep the partitions and try your suggestions to see how it goes.

I apologize for the long time between replies, it is just that testing this
stuff takes 4+ hours each run.

If there are any other suggestions of things for me to look meanwhile as
well, please keep them coming.

Thanks!

Henrique

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Henrique Montenegro 2020-07-14 13:05:19 Re: Sudden insert performance degradation
Previous Message Jeff Janes 2020-07-14 00:05:07 Re: Sudden insert performance degradation