Re: BLOB support

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BLOB support
Date: 2011-06-06 15:23:08
Message-ID: 201106061723.08711.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Monday 06 of June 2011 16:13:26
> =?UTF-8?Q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> > I think more about this with contrast to sent references, but I still
> > have in my mind construct
> > Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
> > outdated we have BlueRay
> > conn.prepareStatemnt("INSERT INTO someonetubevideos values (?)")
> > where 1st parameter is myWeddingDvd,
>
> Yes, if you insist upon designing the API like that, then you come to
> the conclusion that you need global LOB identifiers.
>
> However, there are many ways to design this that don't work that way.
> One idea to think about is
>
> insert into someonetubevideos values('')
> returning open_for_write(videocolumn)
>
> which gives you back some kind of writable stream ID (this is a
> transient, within-session ID, not global) for the target field in the
> row you just inserted.
I know, but this is a little bit old-fashioned bahaviour.

> BTW, as was noted upthread by Dimitri, this whole subject has been
> discussed before on pgsql-hackers. You really ought to go re-read the
> previous threads.
>
> regards, tom lane

I read this, but it may be rethinked again. Actaully changes to TOAST (I mean
streaming will be just for LOBs, I had written all found disadvantages for
TOAST for LOB and it's looks like only performance of above is some kind of
disadvantage, as well this prevent some less usefull concepts of Copy on Write
for LOBs.

Introducing streaming for TOAST is little useless, sorry just for cite from
my, mentoined document:

(This is generally about on demand stream of TOASTed value, in
context of LOBs is acceptable, as long not transactional aware LOBs are
acceptable). If we will add streaming of TOASTed values, so caller will
get some reference to this value, we need to ensure that pointed data
will not be changed, nor deleted - I think this will require caller to
add FOR UPDATE (or silently to add this by server) for each statement
returning pointers to TOASTed, as client may do transactional query, and
other client just after (1st) may remove record, commit, and call
VACUUM. In this situation when 1st will try to read data form given row,
it will get error. This may be accpetable for LOBs (commonly LOBs may be not
transaction aware, but I will be angry if this will happen with VARCHAR)

If this is acceptable I will do following changes.

Add
- server_max_in_memory_lob_size - GUC server start-only config to describe
maximum value of client session parameter max_in_memory_lob.

- max_in_memory_lob - session GUC describing how huge LOBs may be keept in
memory before backing up to file

- rescursivly toasting, detoasting during insert/update/remove for searching
for LOBs (we need this for arrays and complex types) - this is for last stage
(error disallowing LOBs in composites/arrays may be quite enaugh, for
begining) - I want LOBs to be starting point for LOBing other types (e.g. some
big arrays may be LOBbed).

- during toasting, lob will be toasted and in place of LOB, the reference to
it will be putted, and encoded in LOB datum.

- api for LOB manipulation (few changes to current implementation) in way that
BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes
for LOB will not affect size of datum looking at size of LOB.

- api for maintaing temoraly lob files, we need this as per session list of id
-> file desc, to prevent prevent stealing of lobs by different connections
(security)

- streaming api for TOASTED values (based on COPY protocol, or changed COPY
protocol) or at least function calls - I havent looked at this in context of
TOASTed LOBs.

Is it good?

Regards,
Radek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-06-06 15:41:11 Re: BLOB support
Previous Message Tom Lane 2011-06-06 15:03:43 Re: Domains versus polymorphic functions, redux