From: | Andrew Chernow <pg-job(at)esilo(dot)com> |
---|---|
To: | Jorge Godoy <jgodoy(at)gmail(dot)com> |
Cc: | John McCawley <nospam(at)hardgeus(dot)com>, Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>, imageguy <imageguy1206(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Database versus filesystem for storing images |
Date: | 2007-01-05 23:32:02 |
Message-ID: | 459EDFF2.9030303@esilo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> copying 3 billion files and a few hundred terabytes while still maintaining an
> adequate service rate with part of its infra-structure down, just to use your
I wasn't saying to do this each time you run a backup, geez that would be
horrible. Pickup from where you left off the last time you backed up
data/records. How many images and how much data is being generated in a 60
second period? I dought 3 billion files and hundreds of terabytes. When you
know what your data generation is, you know what resources you need to replicate
this information to a backup server (local or remote).
How is this any different than db replication. It would have to backup the same
amount of information? You would require the same horse power and bandwidth.
andrew
Jorge Godoy wrote:
> Andrew Chernow <pg-job(at)esilo(dot)com> writes:
>
>>> And how do you guarantee that after a failure? You're restoring two
>>> different sets of data here:
>>> How do you link them together on that specific operation? Or even on a daily
>>> basis, if you get corrupted data...
>> I answered that already.
>
> I'm sorry. It must be the flu, the pain or something else, but I really don't
> remember reading your message about how you can be 100% sure that all
> references to the filesystem have their corresponding files present and also
> all present files have their respective database entry.
>
> I've seen HA measures (I don't imagine anyone sacrificing their customers
> copying 3 billion files and a few hundred terabytes while still maintaining an
> adequate service rate with part of its infra-structure down, just to use your
> example to that answer...), ideas about requiring an answer from the
> filesystem before considering the transaction done DB-wise (who grants you
> that the image really went to the disk and is not on cache when the machine
> has a power failure and shuts down abruptly?)...
>
> I might have missed your message, though. Would you be gentle to quote that
> again, please?
>
>> Another nice feature is the database and images can be handled spearately.
>
> What might be bad.
>
>> Some people have seen this as a disadvantage on this thread, I personally
>> don't see it that why.
>
> I am questioning two points that show two situations where it is bad.
> Specially if those images are important to the records (e.g. product failure
> images, prize winning images, product specs, prototype images, blueprints --
> after all, we don't need to restrict our files to images, right? --,
> agreements, spreadsheets with the last years of company account movements,
> documents received from lawyers, etc.).
>
>> I guess it depends on access needs, many files and how much data you have.
>> What if you had 3 billion files across a few hundred terabytes? Can you say
>> with experience how the database would hold up in this situation?
>
> I'd have partitioning if I had a case like that. Part of those would be
> delegated to one machine, part to another and so on. Even if that solution --
> partitioning -- makes the overall MTBF lower...
>
> And I still can't imagine how you guarantee that all 3 billion files have
> their corresponding entries on the database. Couting them is not enough since
> I can have one file with the wrong "name" present on the filesystem or some
> duplicate record on the DB...
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Jorge Godoy | 2007-01-05 23:41:32 | Re: Continue sequence |
Previous Message | Andrew Chernow | 2007-01-05 23:28:25 | Re: Database versus filesystem for storing images |