Re: Where to store Blobs?

From: Jamesie Pic <jpic(at)yourlabs(dot)org>
To: Tim Cross <theophilusx(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Where to store Blobs?
Date: 2019-04-19 08:26:35
Message-ID: CAC6Op1_8oVQK1tcRON-HYWJpNWVPC7UbZ6NyeMnT_0Hrakn5rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Storing files in the database has already been demonstrated as not an
efficient trade-off [0]. As such, this post provides a subjective story
about files in the database, rather than an objective compare.

So, I have this ridiculous situation where dumping a database with 12k
user-submitted forms with files, made by 3000 users, which I hope we can
agree “is absolutely nothing”, yet, it eats 35G of PG data.

As a result, dumping the database takes already 32 minutes and is extremely
write-intensive, causing more than 5 second disk backlogs during the
operation at the same time.

If I had stored file paths, like I should have, the db would take a few MBs
and the dump a few seconds even less than one second ?

Also, the backup could just read the file tree from the file system and
synchronize only new files: we don’t have programmatic modifications of
written files for legal reasons: which makes it even more ridiculous to
dump them every time we do a backup.

As such, my biggest regret is to have listened to the manager who imposed
this design decision, but I trusted him at the time I was a fresh hire.

Today, we have the choice of two poisons for deployment:

not shutting down the service during the backup, to save 32 minutes of
downtime, but that’s 32 minutes of writes that are at risk in the case of a
faulty destructive operation, yes that doesn’t happen 99% of the time,
shutting down the service during the backup, as we should, but that
means 32 minutes of extra downtime

In my experience, storing files in the database is a critical mistake. I
mean, if you know what you’re doing maybe, and I thought the said manager
would know what he’s doing.

But at the end of a year the manager decided to ignore all monitoring
alarms concerning disk space that were raised during the backup. As such, I
kept going and freeing as much space as possible when a backup failed to
happen, which hopefully canceled the full deployment, leaving the service
online, even though with a full disk.

I have raised the warning to the customer for months and but the manager
kept insisting that we close our eyes on it, and kept on doing palliative
fixes when needed.

Of course, piling up palliatives fixes in production eventually created the
situation where the disk was too full to make a backup. The manager that
had installed the server OS had put 3 disks in a RAID1 array with extremely
poor partitioning.

As such, i had to spend a night repartitioning the RAID1, so that the /
would be on 10G instead of 40G. Which kept us going a bit more, but
unfortunnately one week shorter than expected, because I had forgot to
include the exponential growth in the math of the estimation.

Leading to even more incidents. If you store files in the database, which
you shouldn’t unless you’re 100% sure about what you’re doing, then do not
ignore disk space warnings during backups. Or else … well what do you think
happens when an airplane pilot ignores the alarms on their dashboard ?

99% of incidents are a suite of predictable events.

https://habiletechnologies.com/blog/better-saving-files-database-file-system/
https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
https://blog.yourlabs.org/post/184290880553/story-of-a-database-with-files-in-it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2019-04-19 08:45:27 Re: PostgreSQL ping/pong to client
Previous Message Gavin Flower 2019-04-19 02:03:56 Re: Multicolumn index for single-column queries?