From: | sud <suds1434(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Partitioning options |
Date: | 2024-02-08 05:42:26 |
Message-ID: | CAD=mzVUsZGHg2OGp=kEXa3Ynyfbi9F0u4Ai0yZY3h7aofNG=6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi ,
We have a system which stores customers' transactions. There are a total of
~100K customers currently and the list will increase in future but not
drastically though(maybe ~50K more or so). The number of transactions per
day is ~400million. and we want to persist them in our postgres database
for ~5months.
The key transaction table is going to have ~450 Million transactions per
day and the data querying/filtering will always happen based on the
"transaction date" column. And mostly "JOIN" will be happening on the
"CUTSOMER_ID" column along with filters for some scenarios on customer_id
columns. Each one day worth of transaction consumes ~130GB of storage space
as we verified using the "pg_relation_size" function, for a sample data set.
As mentioned, there will be ~100K distinct "customer_id" but the data won't
be equally distributed , they will be skewed in nature for e.g. Some of the
big customers will hold majority of the transactions (say 20-30% of total
transactions) and other are distributed among others, but again not equally.
So my question was , in the above scenario should we go for a composite
partitioning strategy i.e range/hash (daily range partition by
transaction_date and hash subpartition by customer_id)?
OR
Should we go for simple daily range partitioning on the transaction_date
column?
OR
Range/list composite partitioning (range partition by transaction_date and
list subpartition by customer_id)?
Thanks and Regards
Sud
From | Date | Subject | |
---|---|---|---|
Next Message | Alpaslan AKDAĞ | 2024-02-08 08:23:02 | Re: archive command doesnt work |
Previous Message | Lok P | 2024-02-08 05:12:11 | Re: How to do faster DML |