Re: To BLOB Or Not To BLOB

From: chewie(at)wookimus(dot)net
To: pgsql-general(at)postgresql(dot)org
Subject: Re: To BLOB Or Not To BLOB
Date: 2000-04-17 07:46:54
Message-ID: 20000417024654.C11802@wookimus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

How do I start this? There are so many different ways to look at this
question. Do I save files inside the database, or do I simply use the database
as a tool to query information about a given file, it's properties, etc.

Instead of making general statements, let me pick specific examples. Let's say
we want to create an MP3 repository. Our goals are to be able to query the
information saved in ID3 tags, to locate, organize, manipulate, or play the
MP3. To narrow the scope of discussion, let's say I will be accessing these
files locally, on my filesystem or a NFS mounted directory.

What would be the most efficient way for us to query the information found in
the ID3 tags? We could parse each file in a number of directories and their
subdirectories for the information we require each time we need it, or we could
parse it once and save the information in a database structured around ID3
information. I would say the later is definitely more desireable and
acceptable. Inside the database, the ID3 tags can be indexed and our searches
optimized by the information we're requesting.

But should we store the mp3's inside the database? I say no. A few
observations:

* Filesystems are generally better at allocating space for files than
databases are.
* Storing files inside databases renders shell access or direct access to
files as impossible. If the database goes South, so does access to the
files.

Now, our mp3 decoders are shell programs that read mp3 input by opening a file,
a URL connection, or reading it from standard input (that is if we have a good
mp3 player). Yet, in order for these mp3 decoders to be able to play an mp3
found inside the database, we are forced to use a specialized database
interface. It may return the mp3 file to the decoder via standard input,
or it may export the file to a temp directory and feed the mp3 player the
location of that temp file.

Yet, why should we go through all of this trouble? And why should we tie the
operation of two clearly separate applications into such inflexible
relationships? We add overhead to the database to output this file, and we add
another level of logic to get the file to the decoder. A level of logic that
could prove the downfall of a would-be MP3 mobile disc jockey, who's worked so
hard to please the Wicked Witch of the Wedding. He'd be forced to pull out the
CD's and tapes he so dreads -- if he remembered to bring them along.

If we keep file access decoupled from gleaning catagorizable, quantifiable data
about/from the file, we allow two separate and efficient tools to do their jobs
independently. If file access is your concern, then use a file system that
will give you the level of control you desire to develop your security
policies.

If grabbing information from a data file is important, such as parsing text
files for content, then build indexes to the data inside the file and store
those indexes in the database. Internet directories are excellent examples of
this! We don't neuter local file access by the applications that need the data
files, and we still provide access to searchable information.

I would argue that access to a data file and querying sets of data files for
information are two separate layers of logic. Logical layers are best served
when they are given tools to interact, but are not crippled by tight
integration. Saving data files inside a database, IMHO, is an example of tight
integration, and something to be avoided if possible -- at least in this case.

--
Chad "^chewie, gunnarr" Walstrom <chewie(at)wookimus(dot)net>
http://wookimus.net/chewie

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Davies 2000-04-17 08:42:37 Re: To BLOB Or Not To BLOB
Previous Message Haroldo Stenger 2000-04-17 04:47:46 Re: Does error within transaction imply restarting it?