Last thoughts about LONG

From: wieck(at)debis(dot)com (Jan Wieck)
To: wieck(at)debis(dot)com
Cc: pgman(at)candle(dot)pha(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Last thoughts about LONG
Date: 1999-12-11 16:21:28
Message-ID: m11wpGy-0003kGC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> Bruce Momjian wrote:
>
> > Should we use large objects for this, and beef them up. Seems that
> > would be a good way. I have considered putting them in a hash
> > bucket/directory tree for faster access to lots of large objects.
> >
> > There is a lot to say about storing long tuples outside the tables
> > because long tuples fill cache buffers and make short fields longer to
> > access.
>
> I thought to use a regular table. Of course, it will eat
> buffers ...

When looking at my actual implementation concept, I'm not
sure if it will win or loose compared against text itself!
Amazing, but I think it could win already on relatively small
text sizes (1-2K is IMHO small compared to what this type
could store).

Well, the implementation details. I really would like some
little comments to verify it's really complete before
starting.

- A new field "rellongrelid" type Oid is added to pg_class.
It contains the Oid of the long-value relation or the
invalid Oid for those who have no LONG attributes.

- At CREATE TABLE, a long value relation named
"_LONG<tablename>" is created for those tables who need it.
And of course dropped and truncated appropriate. The schema
of this table is

rowid Oid, -- oid of our main data row
rowattno int2, -- the attribute number in main data
chunk_seq int4, -- the part number of this data chunk
chunk text -- the content of this data chunk

There is a unique index defined on (rowid, rowattno).

- The new data type is of variable size with the following
header:

typedef struct LongData {
int32 varsize;
int32 datasize;
Oid longrelid;
Oid rowid;
int16 rowattno;
} LongData;

The types input function is very simple. Allocate
sizeof(LongData) + strlen(input), set varsize to it,
datasize to strlen(input), and the rest to invalid and 0.
Then copy the input after the struct.

The types output function determines on the longrelid, what
to do. If it's invalid, just output the bytes stored after
the struct (it must be a datum that resulted from an input
operation. If longrelid isn't invalid, it does an index
scan on that relation, fetching all tuples that match rowid
and attno. Since it knows the datasize, it doesn't need
them in the correct order, it can put them at the right
places into the allocated return buffer by their chunk_seq.

- For now (until we have enough experience to judge) I think
it would be better to forbid ALTER TABLE when LONG
attributes are involved. Sure, must be implemented
finally, but IMHO not on the first evaluation attempt.

Now how the data goes in and out of the longrel.

- On heap_insert(), we look for non NULL LONG attributes in
the tuple. If there could be any can simply be seen by
looking at the rellongrelid in rd_rel. We fetch the value
either from the memory after LongData or by using the type
output function (for fetching it from the relation where it
is!). Then we simply break it up into single chunks and
store them with our tuples information. Now we need to do
something tricky - to shrink the main data tuple size, we
form a new heap tuple with the datums of the original one.
But we replace all LongData items we stored by faked ones,
where the varsize is sizeof(LongData) and all the other
information is setup appropriate. We append that faked
tuple instead, copy the resulting information into the
original tuples header and throw it away.

This is a point, where I'm not totally sure. Could it
possibly be better or required to copy the entire faked
tuple over the one we should have stored? It could never
need more space, so that wouldn't be a problem.

- On heap_replace(), we check all LONG attributes if they are
NULL of if the information in longrelid, rowid and rowattno
doesn't match our rellongrelid, tupleid, and attno. In that
case this attribute might have an old content in the
longrel, which we need to delete first.

The rest of the operation is exactly like for
heap_insert(), except all the attributes information did
match - then it's our own OLD value that wasn't changed. So
we can simply skip it - the existing data is still valid.

- heap_delete() is so simple that I don't explain it.

Now I hear you asking "how could this overhead be a win?" :-)

That's easy to explain. As long as you don't use a LONG
column in the WHERE clause, when will the data be fetched? At
the time it's finally clear that it's needed. That's when a
result tuple is sent to the client (type output) or when a
tuple resulting from INSERT ... SELECT should be stored.

Thus, all the tuples moving around in the execution tree,
getting joined together, abused by sorts and aggregates and
filtered out again, allways contain the small LongData
struct, not the data itself. Wheren't there recently reports
about too expansive sorts due to their huge size?

Another bonus would be this: What happens on an UPDATE to a
table having LONG attributes? If the attribute is not
modified, the OLD LongData will be found in the targetlist,
and we'll not waste any space by storing the same information
again. IIRC that one was one of the biggest concerns about
storing huge data in tuples, but it disappeared without
leaving a trace - funny eh?

It is so simple, that I fear I made some mistake somewhere.
But where?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 1999-12-11 16:45:53 Re: [HACKERS] LONG
Previous Message Peter Eisentraut 1999-12-11 16:14:40 Re: [HACKERS] LONG