Re: Batch insert heavily affecting query performance.

From: Jean Baro <jfbaro(at)gmail(dot)com>
To: Danylo Hlynskyi <abcz2(dot)uprola(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Batch insert heavily affecting query performance.
Date: 2017-12-25 03:10:28
Message-ID: CA+fQeeno-4Pomi0f3cif0TVaCLyqCxDwGvGQfL4hBf91KEfoSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the clarification guys.

It will be super useful. After trying this I'll post the results!

Merry Christmas!

Em 25 de dez de 2017 00:59, "Danylo Hlynskyi" <abcz2(dot)uprola(at)gmail(dot)com>
escreveu:

> I had an opportunity to perform insertion of 700MM rows into Aurora
> Postgresql, for which performance insights are available. Turns out, that
> there are two stages of insert slowdown - first happens when max WAL
> buffers limit reached, second happens around 1 hour after.
>
> The first stage cuts insert performance twice, and WALWrite lock is main
> bottleneck. I think WAL just can't sync changes log that fast, so it waits
> while older log entries are flushed. This creates both read and write IO.
>
> The second stage is unique to Aurora/RDS and is characterized by excessive
> read data locks and total read IO. I couldn't figure out why does it read
> so much in a write only process, and AWS support didn't answer yet.
>
> So, for you, try to throttle inserts so WAL is never overfilled and you
> don't experience WALWrite locks, and then increase wal buffers to max.
>
> 24 груд. 2017 р. 21:51 "Jean Baro" <jfbaro(at)gmail(dot)com> пише:
>
> Hi there,
>
> We are testing a new application to try to find performance issues.
>
> AWS RDS m4.large 500GB storage (SSD)
>
> One table only, called Messages:
>
> Uuid
> Country (ISO)
> Role (Text)
> User id (Text)
> GroupId (integer)
> Channel (text)
> Title (Text)
> Payload (JSON, up to 20kb)
> Starts_in (UTC)
> Expires_in (UTC)
> Seen (boolean)
> Deleted (boolean)
> LastUpdate (UTC)
> Created_by (UTC)
> Created_in (UTC)
>
> Indexes:
>
> UUID (PK)
> UserID + Country (main index)
> LastUpdate
> GroupID
>
>
> We inserted 160MM rows, around 2KB each. No partitioning.
>
> Insert started at around 3.000 inserts per second, but (as expected)
> started to slow down as the number of rows increased. In the end we got
> around 500 inserts per second.
>
> Queries by Userd_ID + Country took less than 2 seconds, but while the
> batch insert was running the queries took over 20 seconds!!!
>
> We had 20 Lambda getting messages from SQS and bulk inserting them into
> Postgresql.
>
> The insert performance is important, but we would slow it down if needed
> in order to ensure a more flat query performance. (Below 2 seconds). Each
> query (userId + country) returns around 100 diferent messages, which are
> filtered and order by the synchronous Lambda function. So we don't do any
> special filtering, sorting, ordering or full text search in Postgres. In
> some ways we use it more like a glorified file system. :)
>
> We are going to limit the number of lambda workers to 1 or 2, and then run
> some queries concurrently to see if the query performance is not affect too
> much. We aim to get at least 50 queries per second (returning 100 messages
> each) under 2 seconds, even when there is millions of messages on SQS being
> inserted into PG.
>
> We haven't done any performance tuning in the DB.
>
> With all that said, the question is:
>
> What can be done to ensure good query performance (UserID+ country) even
> when the bulk insert is running (low priority).
>
> We are limited to use AWS RDS at the moment.
>
> Cheers
>
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jean Baro 2017-12-27 15:13:31 Re: Batch insert heavily affecting query performance.
Previous Message Danylo Hlynskyi 2017-12-25 02:59:27 Re: Batch insert heavily affecting query performance.