Re: Storing files: 2.3TBytes, 17M file count

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Thomas Güttler <guettliml(at)thomas-guettler(dot)de>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing files: 2.3TBytes, 17M file count
Date: 2016-11-28 15:10:19
Message-ID: CAKt_Zfv-OZVDQGtxSOD=O4hSGSc2_BHoGxLkAAQpQRH549-0yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 28, 2016 at 3:28 PM, Thomas Güttler <
guettliml(at)thomas-guettler(dot)de> wrote:

> Hi,
>
> PostgreSQL is rock solid and one of the most reliable parts of our
> toolchain.
>
> Thank you
>
> Up to now, we don't store files in PostgreSQL.
>
> I was told, that you must not do this .... But this was 20 years ago.
>
>
> I have 2.3TBytes of files. File count is 17M
>
> Up to now we use rsync (via rsnapshot) to backup our data.
>
> But it takes longer and longer for rsync to detect
> the changes. Rsync checks many files. But daily only
> very few files really change. More than 99.9% don't.
>
> Since we already store our structured data in postgres, I think
> about storing the files in PostgreSQL, too.
>
> What is the current state of the art?
>
> Is it feasible to store file in PostgreSQL?
>
> Are there already projects which use PostgreSQL as storage backend?
>
> I have the hope, that it would be easier to backup only the files which
> changed.
>

There is a tradeoff. On one hand, as you note, it is easier to back things
up if you are storing the files in PostgreSQL. Now, I have *not* looked
at how this would work for binary format transfer so that might be
different, but in most cases I have looked at, the downside is in the
encoding and decoding.

If files come in as hexadecimal, then you already have twice as much
transfer as space. Then typically driver-based encoding will copy in the
process, meaning that you end up with many times the RAM used as the
files. When I tested this in Perl, it was common for 8x the size of the
file to be used in RAM in the course of decoding and sending it on.
Driver, framework, and encoding may affect this, however.

Now, depending on what you are doing, that may not be a problem. It sounds
like you have a large number of files, and they are up to a number MB in
size. Since that memory usage would be short-term that may not be a
problem but I cannot say for you whether it is or not.

So be aware of the tradeoff and decide appropriately.

>
> Regards,
> Thomas Güttler
>
>
> Related question at rsnapshot mailing list:
> https://sourceforge.net/p/rsnapshot/mailman/rsnapshot-discus
> s/thread/57A1A2F3(dot)5090409(at)thomas-guettler(dot)de/
>
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2016-11-28 15:44:05 Re: pg_dump system catalog
Previous Message Adrian Klaver 2016-11-28 15:01:16 Re: Storing files: 2.3TBytes, 17M file count