Re: How to store "blobs" efficiently for small and large sizes, with random access

From: esconsult1(at)gmail(dot)com
To: Dominique Devienne <ddevienne(at)gmail(dot)com>
Cc: Andreas Joseph Krogh <andreas(at)visena(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access
Date: 2022-10-19 12:50:10
Message-ID: 0BE00147-79F4-45E8-BDEF-E809BCFAA668@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We had the same thought of storing the blobs inside LO’s as well many years ago.

But ultimately chose cloud storage and stored a pointer in the database instead.

Now that we are approaching a terabyte of just normal data I don’t regret this decision one bit. Just handling backups and storage is already a chore.

Data in S3 compatible storage is very easy to protect in numerous ways.

We have one set of code responsible for uploading, downloading and deleting the files themselves.

One downside? Occasionally an S3 delete fails and now and again a file or two gets orphaned. But we’ve never not found a file pointed to from our attachments table in 11 years.

We also only store pathnames/base names so we can easily move storage providers if we decide to go on Prem.

There is absolutely no upside to storing files in the db if you anticipate any kind of growth or significant volume.

Ericson Smith
CTO
Travel Agency Tribes

Sent from my iPhone

> On 19 Oct 2022, at 7:01 PM, Dominique Devienne <ddevienne(at)gmail(dot)com> wrote:
>
> 
>> On Wed, Oct 19, 2022 at 1:38 PM Andreas Joseph Krogh <andreas(at)visena(dot)com> wrote:
>
>> There's a reason “everybody” advices to move blobs out of DB, I've learned.
>
> I get that. I really do. But the alternative has some real downsides too.
> Especially around security, as I already mentioned. That's why I'd like if possible
> to get input on the technical questions of my initial post.
>
> That's not to say we wouldn't ultimately move out the big blobs outside the DB.
> But given how much that would complexify the project, I do believe it is better
> to do it as a second step, once the full system is up-and-running and testing at
> scale has actually been performed.
>
> We've already moved other kind of data to PostgreSQL, from SQLite DBs (thousands) this time,
> and ported "as-is" the sharding done on the SQLite side to PostgreSQL (despite TOAST).
> And so far, so good. With good ingestion rates. And decent runtime access to data too,
> in the albeit limited testing we've had so far.
>
> Now we need to move this other kind of data, from proprietary DB-like files this times (thousands too),
> to finish our system, and be able to finally test the whole system in earnest, and at (our limited internal) scale.
>
> So you see, I'm not completely ignoring your advise.
>
> But for now, I'm inquiring as to the *best* way to put that data *in* PostgreSQL,
> with the requirements / constraints I've listed in the first post.
> It may indeed be a bad idea long term. But let's make the most of it for now.
> Makes sense? Am I being unreasonable here? --DD

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2022-10-19 13:01:33 Re: How to store "blobs" efficiently for small and large sizes, with random access
Previous Message Dominique Devienne 2022-10-19 12:01:32 Re: How to store "blobs" efficiently for small and large sizes, with random access