Quick Links

Re: Storing files: 2.3TBytes, 17M file count

From:	Thomas Güttler <guettliml(at)thomas-guettler(dot)de>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Storing files: 2.3TBytes, 17M file count
Date:	2016-11-29 09:50:52
Message-ID:	f3f83bb0-1031-ace9-b6ed-bec74099a793@thomas-guettler.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Am 29.11.2016 um 01:52 schrieb Mike Sofen:
> From: Thomas Güttler Sent: Monday, November 28, 2016 6:28 AM
>
> ...I have 2.3TBytes of files. File count is 17M
>
> Since we already store our structured data in postgres, I think about storing the files in PostgreSQL, too.
>
> Is it feasible to store file in PostgreSQL?
>
> -------
>
> I am doing something similar, but in reverse. The legacy mysql databases I’m converting into a modern Postgres data
> model, have very large genomic strings stored in 3 separate columns. Out of the 25 TB of legacy data storage (in 800
> dbs across 4 servers, about 22b rows), those 3 columns consume 90% of the total space, and they are just used for
> reference, never used in searches or calculations. They range from 1k to several MB.
>
>
>
> Since I am collapsing all 800 dbs into a single PG db, being very smart about storage was critical. Since we’re also
> migrating everything to AWS, we’re placing those 3 strings (per row) into a single json document and storing the
> document in S3 bins, with the pointer to the file being the globally unique PK for the row…super simple. The app tier
> knows to fetch the data from the db and large string json from the S3 bins. The retrieval time is surprisingly fast,
> this is all real time web app stuff.
>
>
>
> This is a model that could work for anyone dealing with large objects (text or binary). The nice part is, the original
> 25TB of data storage drops to 5TB – a much more manageable number, allowing for significant growth, which is on the horizon.

Thank you Mike for your feedback.

Yes, I think I will drop my idea. Encoding binary (the file content) to text and decoding to binary again makes no
sense. I was not aware that this is needed.

I guess I will use some key-to-blob store like s3. AFAIK there are open source s3 implementations available.

Thank you all for your feeback!

Regards, Thomas

--
Thomas Guettler http://www.thomas-guettler.de/

In response to

Re: Storing files: 2.3TBytes, 17M file count at 2016-11-29 00:52:34 from Mike Sofen

Responses

Re: Storing files: 2.3TBytes, 17M file count at 2016-11-29 10:09:01 from Jerome Wagner
Re: Storing files: 2.3TBytes, 17M file count at 2016-11-29 12:06:51 from Stuart Bishop
Re: Storing files: 2.3TBytes, 17M file count at 2016-11-29 15:27:24 from Adrian Klaver

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jerome Wagner	2016-11-29 10:09:01	Re: Storing files: 2.3TBytes, 17M file count
Previous Message	Thomas Güttler	2016-11-29 09:22:13	We reached the limit of inotify. Was: Storing files: 2.3TBytes, 17M file count