Re: [HACKERS] LONG

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jan Wieck <wieck(at)debis(dot)com>
Cc: peter_e(at)gmx(dot)net, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] LONG
Date: 1999-12-11 23:25:12
Message-ID: 199912112325.SAA13157@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Bruce Momjian wrote:
>
> > > While this is great and all, what will happen when long tuples finally get
> > > done? Will you remove this, or keep it, or just make LONG and TEXT
> > > equivalent? I fear that elaborate structures will be put in place here
> > > that might perhaps only be of use for one release cycle.
> >
> > I think the idea is that Jan's idea is better than chaining tuples.
>
> Just as Tom already pointed out, it cannot completely replace
> tuple chaining because of the atomicy assumption of single
> fsync(2) operation in current code. Due to this, we cannot
> get around the cases LONG will leave open by simply raising
> BLKSIZE, we instead need to tackle that anyways.

Actually, in looking at the fsync() system call, it does write the
entire file descriptor before marking the transaction as complete, so
there is no hard reason not to raise it, but because the OS has to do
two reads to get 16k, I think we are better keeping 8k as our base block
size.

Jan's idea is not to chain tuples, but to keep tuples at 8k, and instead
chain out individual fields into 8k tuple chunks, as needed. This seems
like it makes much more sense. It uses the database to recreate the
chains.

Let me mention a few things. First, I would like to avoid a LONG data
type if possible. Seems a new data type is just going to make things
more confusing for users.

My ideas is a much more limited one than Jan's. It is to have a special
-1 varlena length when the data is chained on the long relation. I
would do:

-1|oid|attno

in 12 bytes. That way, you can pass this around as long as you want,
and just expand it in the varlena textout and compare routines when you
need the value. That prevents the tuples from changing size while being
processed. As far as I remember, there is no need to see the data in
the tuple except in the type comparison/output routines.

Now it would be nice if we could set the varlena length to 12, it's
actual length, and then just somehow know that the varlena of 12 was a
long data entry. Our current varlena has a maximum length of 64k. I
wonder if we should grab a high bit of that to trigger long. I think we
may be able to do that, and just do a AND mask to remove the bit to see
the length. We don't need the high bit because our varlena's can't be
over 32k. We can modify VARSIZE to strip it off, and make another
macro like ISLONG to check for that high bit.

Seems this could be done with little code.

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-12-11 23:28:07 Re: [HACKERS] LONG
Previous Message Jan Wieck 1999-12-11 23:05:37 Re: [HACKERS] LONG