Re: to BLOB or not to BLOB

From: KuroiNeko <evpopkov(at)carrier(dot)kiev(dot)ua>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: to BLOB or not to BLOB
Date: 2001-05-14 10:49:24
Message-ID: 3AFFB834.nail8N11QH45@ed.ed
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> I am going to write an application tht writes big amounts of plain text
into a database.

What kind of data are those? Articles? News/mail messages? Other?

> I thought of using the text type for this, but I don't know if it has a
maxlenght, and given that these will be very long texts I started wondered
if these would have to be blobs... but they aren't binary.

Well, plain text is a `subset' of `binary,' so storing it as a BLOB
shouldn't be an issue. My home-brewn email archive stores message bodies as
BLOBs with absolutely no problem.
On length limit of text type field, for older versions of PGSQL it should
be less that page size (default 8 KB, configurable at compile time). In
TOASTed PGSQL there's probably no limit set by DBMS itself.

> So, does the text daya type have a maxium lenght? Do BLOBs? What should I
use? For the time being, at least, I won't be searching these texts with a
search engine or anything... but if I were to be, what considerations
should I take into account when designing the tables?

As you already figured it, the answer to your question depends on what do
you want from your system. Maybe you'll be OK with BLOBs but you have to
keep in mind that pg_dump couldn't handle them and you need to perform
somewhat tricky things to backup and restore PGSQL DB with BLOBs.
Another thing to consider is searchability and indexes. If you really
don't want (and never going to) search you big fields, your best way would
be to store just file names and to keep long chunks of text in external
files. Sure, you can store them in TOASTed fields if you need indexes,
but _IMHO_, this is yet to be proven that simplicity of design and
implementation is worth _possiblie_ preformance degradation. I believe we
still have to see good and fast index built on a 100 KB text field.
Maybe what you need is FTS, because IMNSHO, there's no much use of an
attribute that can't be indexed and searched (updated, joined, grouped) on
swiftly. Everything relatively small and fast to process goes to DB, huge
portions of not-easily-indexable data should stay outside.
Actually, FTS is a last resort, kind of. You may wish to split your text
into smaller fragments with similar semantical (logical, whatever) load.

--

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vince Vielhaber 2001-05-14 11:43:42 Re: Invoices
Previous Message Alexander Lohse 2001-05-14 09:13:15 Speeding up Query