From: | Jean Baro <jfbaro(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Half billion records in one table? RDS |
Date: | 2017-11-27 17:58:01 |
Message-ID: | CA+fQeemNGkyF_Aw9=XpR_jhCeMrXtKq5qfzsTPWT3W-nWHcgjQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi there,
We are creating a new DB which will behave most like a file system, I mean,
there will be no complex queries or joins running in the DB. The idea is to
grab the WHOLE set of messages for a particular user and then filter,
order, combine or full text search in the function itself (AWS Lambda). The
maximum number of messages is limited to 1.000 messages per user. So we
expect Postgres to have an amazing performance for this scenario.
As I am not really familiar with PG (9.6, or 10, in case RDS release it
before February) I would like to share what we are planning to do for this
DB. So if you guys could share your thoughts, that would be great! :)
Table structure:
· MessageID (UUID) - PK
· UserCountry (ISO)
· UserRole (TEXT 15)
· UserID (TEXT 30) – FK (although there is no constraint)
· LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and
end_date?)
· Channel (TEXT 15)
· Tags (TEXT 2000)
· Menu (TEXT 200)
· Icon (TEXT 500) – URL to an image which will be used as an icon;
· Title (TEXT 150)
· *Body (JSON – up to 10K) – Meta data describing all the data to a
specific type of message. The JSON changes according to the type of
message. We are assuming most messages will use less than 1K for this
field.*
· Delete (BOOLEAN) – Soft Delete
· Created (Timestamp – With TZ)
· CreatedBy (TEXT 50)
Only 1 table
· Messages
3 indexes:
· MessageID PK (UUID)
· Main fetch key (UserCountry + UserID) - *****
· End_date (To locate old messages that can be moved to another DB
- which will hold the old messages);
Sizing and worst case scenario:
· 500MM messages in the main DB
· 4K queries per second (by UserID) – Max time of 500ms per query.
Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the
messages for a particular user. MAX 1000 messages per USER.
· 1K inserts per second on average (So that in 1 hour we can insert
around 3MM messages)
· 1K deletes per second on average (So that in 1 hour we can remove
around 3MM messages)
My question is:
- Can we use any kind of compression for PostgreSQL which would result
in reduced IO and disk size?
- We are not relying on any kind of table partitioning, is that the best
approach for this scenario?
- Is PG on RDS capable of delivering this type of performance while
requiring low maintenance?
- What about Auto Vacuum? Any suggestion how to optimize it for such a
work load (we will insert and delete millions of rows every day).
P.S.: We are going to test all this, but if we don't get the performance we
are expecting, all optimization tips from you guys will be really
appreciated. :)
Thanks
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2017-11-27 18:18:33 | Re: Half billion records in one table? RDS |
Previous Message | Dmitry Shalashov | 2017-11-25 18:59:40 | Re: Query became very slow after 9.6 -> 10 upgrade |