Re: Half billion records in one table? RDS

From: Aaron Werman <aaron(dot)werman(at)gmail(dot)com>
To: Jean Baro <jfbaro(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Half billion records in one table? RDS
Date: 2017-12-06 01:59:35
Message-ID: CAER1-9rYmb=UMM8psNeOPVMAvYgXsuzfC1Yw4YBGKEWWZsNdAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Why not store metadata in pg and the payload in S3?

On Mon, Nov 27, 2017 at 11:58 AM Jean Baro <jfbaro(at)gmail(dot)com> wrote:

> Hi there,
>
> We are creating a new DB which will behave most like a file system, I
> mean, there will be no complex queries or joins running in the DB. The idea
> is to grab the WHOLE set of messages for a particular user and then filter,
> order, combine or full text search in the function itself (AWS Lambda). The
> maximum number of messages is limited to 1.000 messages per user. So we
> expect Postgres to have an amazing performance for this scenario.
>
> As I am not really familiar with PG (9.6, or 10, in case RDS release it
> before February) I would like to share what we are planning to do for this
> DB. So if you guys could share your thoughts, that would be great! :)
>
> Table structure:
>
>
>
> · MessageID (UUID) - PK
>
> · UserCountry (ISO)
>
> · UserRole (TEXT 15)
>
> · UserID (TEXT 30) – FK (although there is no constraint)
>
> · LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and
> end_date?)
>
> · Channel (TEXT 15)
>
> · Tags (TEXT 2000)
>
> · Menu (TEXT 200)
>
> · Icon (TEXT 500) – URL to an image which will be used as an icon;
>
> · Title (TEXT 150)
>
> · *Body (JSON – up to 10K) – Meta data describing all the data to
> a specific type of message. The JSON changes according to the type of
> message. We are assuming most messages will use less than 1K for this
> field.*
>
> · Delete (BOOLEAN) – Soft Delete
>
> · Created (Timestamp – With TZ)
>
> · CreatedBy (TEXT 50)
>
>
>
> Only 1 table
>
> · Messages
>
> 3 indexes:
>
> · MessageID PK (UUID)
>
> · Main fetch key (UserCountry + UserID) - *****
>
> · End_date (To locate old messages that can be moved to another
> DB - which will hold the old messages);
>
>
>
> Sizing and worst case scenario:
>
>
>
> · 500MM messages in the main DB
>
> · 4K queries per second (by UserID) – Max time of 500ms per
> query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the
> messages for a particular user. MAX 1000 messages per USER.
>
> · 1K inserts per second on average (So that in 1 hour we can
> insert around 3MM messages)
>
> · 1K deletes per second on average (So that in 1 hour we can
> remove around 3MM messages)
>
>
> My question is:
>
>
> - Can we use any kind of compression for PostgreSQL which would result
> in reduced IO and disk size?
> - We are not relying on any kind of table partitioning, is that the
> best approach for this scenario?
> - Is PG on RDS capable of delivering this type of performance while
> requiring low maintenance?
> - What about Auto Vacuum? Any suggestion how to optimize it for such a
> work load (we will insert and delete millions of rows every day).
>
> P.S.: We are going to test all this, but if we don't get the performance
> we are expecting, all optimization tips from you guys will be really
> appreciated. :)
>
> Thanks
>
>
>
> --

Regards,
/Aaron

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Hall 2017-12-06 06:15:13 Re: insert and query performance on big string table with pg_trgm
Previous Message Tom Lane 2017-12-05 18:50:11 Re: Bitmap scan is undercosted? - boolean correlation