Re: [HACKERS] LONG

From: wieck(at)debis(dot)com (Jan Wieck)
To: pgman(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian)
Cc: wieck(at)debis(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] LONG
Date: 1999-12-13 06:27:06
Message-ID: m11xOws-0003kGC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:

> > > No need for attno in there anymore.
> >
> > I still need it to explicitly remove one long value on
> > update, while the other one is untouched. Otherwise I would
> > have to drop all long values for the row together and
> > reinsert all new ones.
>
> I am suggesting the longoid is not the oid of the primary or long*
> table, but a unque id we assigned just to number all parts of the long*
> tuple. I thought that's what your oid was for.

It's not even an Oid of any existing tuple, just an
identifier to quickly find all the chunks of one LONG value
by (non-unique) index.

My idea is this now:

The schema of the expansion relation is

value_id Oid
chunk_seq int32
chunk_data text

with a non unique index on value_id.

We change heap_formtuple(), heap_copytuple() etc. not to
allocate the entire thing in one palloc(). Instead the tuple
portion itself is allocated separately and the current memory
context remembered too in the HeapTuple struct (this is
required below).

The long value reference in a tuple is defined as:

vl_len int32; /* high bit set, 32-bit = 18 */
vl_datasize int32; /* real vl_len of long value */
vl_valueid Oid; /* value_id in expansion relation */
vl_relid Oid; /* Oid of "expansion" table */
vl_rowid Oid; /* Oid of the row in "primary" table */
vl_attno int16; /* attribute number in "primary" table */

The tuple given to heap_update() (the most complex one) can
now contain usual VARLENA values of the format

high-bit=0|31-bit-size|data

or if the value is the result of a scan eventually

high-bit=1|31-bit=18|datasize|valueid|relid|rowid|attno

Now there are a couple of different cases.

1. The value found is a plain VARLENA that must be moved
off.

To move it off a new Oid for value_id is obtained, the
value itself stored in the expansion relation and the
attribute in the tuple is replaced by the above structure
with the values 1, 18, original VARSIZE(), value_id,
"expansion" relid, "primary" tuples Oid and attno.

2. The value found is a long value reference that has our
own "expansion" relid and the correct rowid and attno.
This would be the result of an UPDATE without touching
this long value.

Nothing to be done.

3. The value found is a long value reference of another
attribute, row or relation and this attribute is enabled
for move off.

The long value is fetched from the expansion relation it
is living in, and the same as for 1. is done with that
value. There's space for optimization here, because we
might have room to store the value plain. This can happen
if the operation was an INSERT INTO t1 SELECT FROM t2,
where t1 has few small plus one varsize attribute, while
t2 has many, many long varsizes.

4. The value found is a long value reference of another
attribute, row or relation and this attribute is disabled
for move off (either per column or because our relation
does not have an expansion relation at all).

The long value is fetched from the expansion relation it
is living in, and the reference in our tuple is replaced
with this plain VARLENA.

This in place replacement of values in the main tuple is the
reason, why we have to make another allocation for the tuple
data and remember the memory context where made. Due to the
above process, the tuple data can expand, and we then need to
change into that context and reallocate it.

What heap_update() further must do is to examine the OLD
tuple (that it already has grabbed by CTID for header
modification) and delete all long values by their value_id,
that aren't any longer present in the new tuple.

The VARLENA arguments to type specific functions now can also
have both formats. The macro

#define VAR_GETPLAIN(arg) \
(VARLENA_ISLONG(arg) ? expand_long(arg) : (arg))

can be used to get a pointer to an allways plain
representation, and the macro

#define VAR_FREEPLAIN(arg,userptr) \
if (arg != userptr) pfree(userptr);

is to be used to tidy up before returning.

In this scenario, a function like smaller(text,text) would
look like

text *
smaller(text *t1, text *t2)
{
text *plain1 = VAR_GETPLAIN(t1);
text *plain2 = VAR_GETPLAIN(t2);
text *result;

if ( /* whatever to compare plain1 and plain2 */ )
result = t1;
else
result = t2;

VAR_FREEPLAIN(t1,plain1);
VAR_FREEPLAIN(t2,plain2);

return result;
}

The LRU cache used in expand_long() will the again and again
expansion become cheap enough. The benefit would be, that
huge values resulting from table scans will be passed around
in the system (in and out of sorting, grouping etc.) until
they are modified or really stored/output.

And the LONG index stuff should be covered here already (free
lunch)! Index_insert() MUST allways be called after
heap_insert()/heap_update(), because it needs the there
assigned CTID. So at that time, the moved off attributes are
replaced in the tuple data by the references. These will be
stored instead of the values that originally where in the
tuple. Should also work with hash indices, as long as the
hashing functions use VAR_GETPLAIN as well.

If we want to use auto compression too, no problem. We code
this into another bit of the first 32-bit vl_len. The
question if to call expand_long() changes now to "is one of
these set". This way, we can store both, compressed and
uncompressed into both, "primary" tuple or "expansion"
relation. expand_long() will take care for it.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 1999-12-13 06:34:30 Re: [HACKERS] update_pg_pwd
Previous Message Bruce Momjian 1999-12-13 06:03:26 Re: [HACKERS] generic LONG VARLENA