- *General*
- Our system gathers data from CI builds regarding a customer's code
base
- *Data includes*
- type
- methods/lines/branches A.K.A - code elements
- files
- *Queries*
- The data is queried for a specific build only
- Queries are aggregations on the code elements and files with some
filtering and sorting
- *Volume*
- I expect to have about 1500 builds a day, 45000 builds a month
- A build can have around 300000 code elements and around 30000 files
- *Retention*
- Thought about keeping 90 days of builds as retention
- *Plan*
- 2 tables
- code elements
- files
- create a partition for each build
- each day delete partitions older than 90 days
- create 2 schemas for each client
- <customer-id> schema which holds the parent tables
- <customer-id>_partitions schema that will hold the partitions
- upon a new build
- create a partition for the code elements table and for the files
table in the "<customer-id>_partitions" schema
- *Questions*
- Will the db hold 135000 (45000 * 3 months) partitions under the
assumption that *I query only 1 partition?*
- Should I model it differently in terms of schema, partitions etc.?