Re: Best Strategy for Large Number of Images

From: Gavan Schneider <list(dot)pg(dot)gavan(at)pendari(dot)org>
To: Estevan Rech <softrech(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Rob Sargent <robjsargent(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Best Strategy for Large Number of Images
Date: 2021-12-15 21:14:02
Message-ID: AE9AABF7-7CEE-4D85-9343-4DCFC38A3880@pendari.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 16 Dec 2021, at 7:12, Estevan Rech wrote:

> I have an application that takes pictures of items and uses them as
> evidence in a report.
>
> These images are saved and then used to generate a report.
>
Once you start talking reports and evidence I think audit trail and verification a.k.a. “chain of evidence”.
With the images in the database they can be part of your information ecosystem, get a datestamp along with checksum and all this included in the audit trail.
All this can be done with the images in the file system and the database holds the pointers only… it just may not be as easy to convince the oppositions lawyers that it is tamper proof, except the best you can really get is tamper evident. 😉

> Each item averages 10 photos and I have about 2 million photos currently,
> with an average growth of 1 million photos over the next year.
>
> I think about using it within the database because of the ease of selecting
> records with the content of the images (when necessary).
>
That will be the same however you go. The selection will be on the context data you enter with the images. There has been no mention of searching within the image itself, and this is not something PG can do “out of the box” anyway.

> I think my biggest concern is to continue to use this strategy for the next
> 2 years and encounter a PostgreSQL limitation or some failure and have to
> migrate database again or have significant loss of images. And on the
> contrary too, if I use it on disk and in 2 years I find some failure or
> file limit in the operating system (currently it's windows server 2016)...
>
The total image data will end up somewhere less than 4Tb if your estimates continue to hold. That’s not going to stress PG. It’s not such a big deal on a modern home computer either, but I can’t make any informed comment on Windows anything. Realistically, once the data is in the database the whole can be moved to a more capable machine and/or OS if problems start to appear. PG will work wherever you deploy it.

> Related to this is the backup service that seems to me that in a disaster,
> I can restore the database relatively quickly if it's in the database. On
> the disk, I believe that the restoration is much slower and I don't think
> it's reliable that all the images are copied/restored.
>
If you have a reliable tested backup system for the database and don’t want to add to the workload then get the images into the database. If there are reasons why you end up with the images outside the database then, of course, the backup process has to be able to keep everything together and guarantee a full restore. Doable as well, but it does add some complexity.

If I am reading this correctly the use case is one of receiving data in the form of photos and field reports, entering the report items associated with the photos into the information system, generating a report (with embedded photos), and keeping records so disputes can be handled. This is not a high volume access to the images so there is no real need for optimum filesystem speed to serve the images… keeping them in the database as bytea is perfectly workable and will work for data volumes well beyond the current estimates.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marc Millas 2021-12-15 21:19:06 Re: Best Strategy for Large Number of Images
Previous Message Bryn Llewellyn 2021-12-15 21:05:42 Re: Why can't I have a "language sql" anonymous block?