Re: Question on overall design

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Question on overall design
Date: 2023-12-09 16:10:42
Message-ID: CANzqJaAqUua7_sE63SMaJbeD7mt97Pi0VPpR=cMUZ6xsfXb3+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Dec 9, 2023 at 6:14 AM veem v <veema0000(at)gmail(dot)com> wrote:
[snip]

> Many of the applications are moving from on premise to AWS cloud as part
> of modernization journey and AWS being chosen cloud partner also the
> product is expected to expand across more regions and this system is
> expected to serve increase in the transaction volume. And also we have a
> requirement to persist transaction data for ~10years to have those
> available for analytics/data science use cases.
>
> So the team is thinking of splitting it into two parts
> 1)OLTP type use case in which we will persist/write the transaction data
> faster and show it to the UI related apps , in near real time/quickest
> possible time. and this database will store Max 60-90 days of transaction
> data. Not sure if we have an option of Oracle exadata equivalent on AWS, so
> team planning of using/experimenting with Aurora postgres. Please correct
> me, if there are any other options we should use otherwise?
>
> 2)Then move the data beyond ~90 days into another database or object
> storage S3 which will keep it there for ~10 years and will be queryable
> using the necessary API's. That is supposed to cater to Olap/analytics/data
> science use cases etc.
>
> Is the above design is okay? and also in regards to the second point above
> i.e. persisting the historical data (that to be in queryable state), should
> we go for some database like snowflake or should just keep it on S3 as is
> and make those queryable through APIs.
>

"OK" is relative, but it's what we did in a similar situation: two years of
data on-line and 5 years of data in compressed files in S3. (We're
required to keep data for 7 years, but they *never* ask for records more
than 2 years old. If they ever do, we'll manually load the relevant data
back into PG.)

(I can't imagine that querying billions of unindexed flat-file records via
S3 would be fast.)

How often do end users look for data more than 90 days old? Two years old?

How quickly does the old data need to be available?

Maybe a four-tiered system of PG-Snowflake-S3-S3_Glacier would balance
speed and cost (or maybe Snowflake would just drive up costs).

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2023-12-09 16:21:38 Re: how can I fix my accent issues?
Previous Message Igniris Valdivia Baez 2023-12-09 15:54:28 Re: how can I fix my accent issues?