Re: How to store "blobs" efficiently for small and large sizes, with random access

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access
Date: 2022-10-19 15:04:10
Message-ID: a9222d92-7163-5009-034d-bdd181e46819@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/19/22 06:38, Andreas Joseph Krogh wrote:
> På onsdag 19. oktober 2022 kl. 13:21:38, skrev Dominique Devienne
> <ddevienne(at)gmail(dot)com>:
>
> On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh
> <andreas(at)visena(dot)com> wrote:
> > Ok, just something to think about;
>
> Thank you. I do appreciate the feedback.
>
> > Will your database grow beyond 10TB with blobs?
>
> The largest internal store I've seen (for the subset of data that goes
> in the DB) is shy of 3TB.
> But we are an ISV, not one of our clients, which have truly massive
> scale for data.
> And they don't share the exact scale of their proprietary data with me...
>
> > If so try to calculate how long it takes to restore, and comply with
> SLA,
> > and how long it would have taken to restore without the blobs.
>
> Something I don't quite get is why somehow backup is no longer needed
> if the large blobs are external?
> i.e. are you saying backups are so much more worse in PostgreSQL than
> with the FS? I'm curious now.
>
> I'm not saying you don't need backup (or redundancy) of other systems
> holding blobs, but moving them out of RDBMS makes you restore the DB to a
> consistent state, and able to serve clients, faster. In my experience It's
> quite unlikely that your (redundant) blob-store needs crash-recovery at
> the same time you DB does. The same goes with PITR, needed because of some
> logical error (like client deleted some data they shouldn't have), which
> is much faster without blobs in DB and doesn't affect the blobstore at all
> (if you have a smart insert/update/delete-policy there).
>

This is nothing to sneeze at.  Backing up a 30TB database takes a *long* time

> Also, managing the PostgreSQL server will be the client's own concern
> mostly. We are not into Saas here.
> As hinted above, the truly massive data is already not in the DB, used
> by different systems, and processed
> down to the GB sized inputs all the data put in the DB is generated
> from. It's a scientific data heavy environment.
> And one where security of the data is paramount, for contractual and
> legal reasons. Files make that harder IMHO.
>
> Anyways, this is straying from the main theme of this post I'm afraid.
> Hopefully we can come back on the main one too. --DD
>
> There's a reason “everybody” advices to move blobs out of DB, I've learned.
>

We deal with an ISV maintaining a banking application.  It stores scanned
images of checks as bytea fields in a Postgresql 9.6 database.  The next
version will store the images outside of the database.

> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas(at)visena(dot)com
> www.visena.com <https://www.visena.com>
> <https://www.visena.com>

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2022-10-19 15:05:07 Re: How to store "blobs" efficiently for small and large sizes, with random access
Previous Message Ron 2022-10-19 14:58:40 Re: How to store "blobs" efficiently for small and large sizes, with random access