Quick Links

Re: [HACKERS] LONG

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Jan Wieck <wieck(at)debis(dot)com>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] LONG
Date:	1999-12-14 01:56:35
Message-ID:	199912140156.UAA22211@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

This outline is perfect!

> > I am suggesting the longoid is not the oid of the primary or long*
> > table, but a unque id we assigned just to number all parts of the long*
> > tuple. I thought that's what your oid was for.
>
> It's not even an Oid of any existing tuple, just an
> identifier to quickly find all the chunks of one LONG value
> by (non-unique) index.

Yes, I understood this and I think it is a great idea. It allows UPDATE
to control whether it wants to replace the LONG value.

>
> My idea is this now:
>
> The schema of the expansion relation is
>
> value_id Oid
> chunk_seq int32
> chunk_data text
>
> with a non unique index on value_id.

Yes, exactly.

>
> We change heap_formtuple(), heap_copytuple() etc. not to
> allocate the entire thing in one palloc(). Instead the tuple
> portion itself is allocated separately and the current memory
> context remembered too in the HeapTuple struct (this is
> required below).

I read the later part. I understand.

>
> The long value reference in a tuple is defined as:
>
> vl_len int32; /* high bit set, 32-bit = 18 */
> vl_datasize int32; /* real vl_len of long value */
> vl_valueid Oid; /* value_id in expansion relation */
> vl_relid Oid; /* Oid of "expansion" table */
> vl_rowid Oid; /* Oid of the row in "primary" table */
> vl_attno int16; /* attribute number in "primary" table */

I see you need vl_rowid and vl_attno so you don't accidentally reference
a LONG value twice. Good point. I hadn't thought of that.

>
> The tuple given to heap_update() (the most complex one) can
> now contain usual VARLENA values of the format
>
> high-bit=0|31-bit-size|data
>
> or if the value is the result of a scan eventually
>
> high-bit=1|31-bit=18|datasize|valueid|relid|rowid|attno
>
> Now there are a couple of different cases.
>
> 1. The value found is a plain VARLENA that must be moved
> off.
>
> To move it off a new Oid for value_id is obtained, the
> value itself stored in the expansion relation and the
> attribute in the tuple is replaced by the above structure
> with the values 1, 18, original VARSIZE(), value_id,
> "expansion" relid, "primary" tuples Oid and attno.
>
> 2. The value found is a long value reference that has our
> own "expansion" relid and the correct rowid and attno.
> This would be the result of an UPDATE without touching
> this long value.
>
> Nothing to be done.
>
> 3. The value found is a long value reference of another
> attribute, row or relation and this attribute is enabled
> for move off.
>
> The long value is fetched from the expansion relation it
> is living in, and the same as for 1. is done with that
> value. There's space for optimization here, because we
> might have room to store the value plain. This can happen
> if the operation was an INSERT INTO t1 SELECT FROM t2,
> where t1 has few small plus one varsize attribute, while
> t2 has many, many long varsizes.
>
> 4. The value found is a long value reference of another
> attribute, row or relation and this attribute is disabled
> for move off (either per column or because our relation
> does not have an expansion relation at all).
>
> The long value is fetched from the expansion relation it
> is living in, and the reference in our tuple is replaced
> with this plain VARLENA.

Yes.

>
> This in place replacement of values in the main tuple is the
> reason, why we have to make another allocation for the tuple
> data and remember the memory context where made. Due to the
> above process, the tuple data can expand, and we then need to
> change into that context and reallocate it.

Yes, got it.

>
> What heap_update() further must do is to examine the OLD
> tuple (that it already has grabbed by CTID for header
> modification) and delete all long values by their value_id,
> that aren't any longer present in the new tuple.

Yes, makes vacuum run find on the LONG* relation.

>
> The VARLENA arguments to type specific functions now can also
> have both formats. The macro
>
> #define VAR_GETPLAIN(arg) \
> (VARLENA_ISLONG(arg) ? expand_long(arg) : (arg))
>
> can be used to get a pointer to an allways plain
> representation, and the macro
>
> #define VAR_FREEPLAIN(arg,userptr) \
> if (arg != userptr) pfree(userptr);
>
> is to be used to tidy up before returning.

Got it.

>
> In this scenario, a function like smaller(text,text) would
> look like
>
> text *
> smaller(text *t1, text *t2)
> {
> text *plain1 = VAR_GETPLAIN(t1);
> text *plain2 = VAR_GETPLAIN(t2);
> text *result;
>
> if ( /* whatever to compare plain1 and plain2 */ )
> result = t1;
> else
> result = t2;
>
> VAR_FREEPLAIN(t1,plain1);
> VAR_FREEPLAIN(t2,plain2);
>
> return result;
> }

Yes.

>
> The LRU cache used in expand_long() will the again and again
> expansion become cheap enough. The benefit would be, that
> huge values resulting from table scans will be passed around
> in the system (in and out of sorting, grouping etc.) until
> they are modified or really stored/output.

Yes.

>
> And the LONG index stuff should be covered here already (free
> lunch)! Index_insert() MUST allways be called after
> heap_insert()/heap_update(), because it needs the there
> assigned CTID. So at that time, the moved off attributes are
> replaced in the tuple data by the references. These will be
> stored instead of the values that originally where in the
> tuple. Should also work with hash indices, as long as the
> hashing functions use VAR_GETPLAIN as well.

I hoped this would be true. Great.

>
> If we want to use auto compression too, no problem. We code
> this into another bit of the first 32-bit vl_len. The
> question if to call expand_long() changes now to "is one of
> these set". This way, we can store both, compressed and
> uncompressed into both, "primary" tuple or "expansion"
> relation. expand_long() will take care for it.

Perfect. Sounds great.

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Re: [HACKERS] LONG at 1999-12-13 06:27:06 from Jan Wieck

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	1999-12-14 01:59:38	Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining
Previous Message	Bruce Momjian	1999-12-14 00:12:08	Re: [HACKERS] pg_createuser