Re: Storing large files in multiple schemas: BLOB or BYTEA

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: tigran2-postgres(at)riatest(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Storing large files in multiple schemas: BLOB or BYTEA
Date: 2012-10-11 09:34:15
Message-ID: CAKt_Zft9Aqwc2pWoFr1BBWyt8GqugYX9u=ERhqD0YOd7icGzHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Oct 10, 2012 at 10:56 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:

> On 10/11/2012 01:35 PM, tigran2-postgres(at)riatest(dot)com wrote:
>
>> Using files stored outside the database creates all sorts of problems.
>> For starters you lose ACID guaranties. I would prefer to keep them in
>> database. We did a lot of experiments with Large Objects and they really
>> worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB).
>> Postgres does a really good job with Large Objects. If it was not the
>> pg_dump problem I would not hesitate to use LOBs.
>>
>
> Yeah, a pg_dump mode that dumped everything but large objects would be
> nice.
>

Remembering when that was the only way pg_dump worked and it caused plenty
of problems.

But yeah, --only-lobs and --no-lobs might be nice switches.

>
> Right now I find storing large objects in the DB such a pain from a backup
> management point of view that I avoid it where possible.
>
>
> I'm now wondering about the idea of implementing a pg_dump option that
> dumped large objects into a directory tree like
> lobs/[loid]/[lob_md5]
> and wrote out a restore script that loaded them using `lo_import`.
>

Thinking of the problems that occurred when we used to require lobs to be
backed up to binary archive formats....

> During dumping temporary copies could be written to something like
> lobs/[loid]/.tmp.nnnn with the md5 being calculated on the fly as the byte
> stream is read. If the dumped file had the same md5 as the existing one
> it'd just delete the tempfile; otherwise the tempfile would be renamed to
> the calculated md5.
>

>
> That way incremental backup systems could manage the dumped LOB tree
> without quite the same horrible degree of duplication as is currently faced
> when using lo in the database with pg_dump.
>

How do incremental backup systems work with lots of data anyway with
pg_dump? I would think thats not the approach I would take to incremental
backups and PostgreSQL....

>
> A last_modified timestamp on `pg_largeobject_metadata` would be even
> better, allowing the cost of reading and discarding rarely-changed large
> objects to be avoided.
>

It might be interesting to look at the issue of large objects from a total
backup perspective. I do wonder though where the end would be. You
could have 500MB text fields and those might pose backup issues as well. I
suppose with better LOB support on backups, it would give additional
options for management.

Best Wishes,
Chris Travers

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-10-11 09:34:34 Re: moving from MySQL to pgsql
Previous Message JC de Villa 2012-10-11 09:25:04 Re: auto-increment field : in a simple way