Re: [HACKERS] Last thoughts about LONG

From: wieck(at)debis(dot)com (Jan Wieck)
To: peter_e(at)gmx(dot)net
Cc: wieck(at)debis(dot)com, pgman(at)candle(dot)pha(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Last thoughts about LONG
Date: 1999-12-11 18:29:59
Message-ID: m11wrHL-0003kGC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

> Before I start the nagging, please be aware that I'm not as smart as I
> think I am. Long datatypes of some sort are clearly necessary -- more
> power to you.

So be it. It forces me to think it over again and points to
sections, I might have forgotten so far. Also, it happend
more than one time to me, that writing a totally OBVIOUS
answer triggerd a better solution in my brain (dunno what's
wrong with that brain, but sometimes it needs to be shaken
well before use). Thus, any of your notes can help, and that
counts!

>
> > - A new field "rellongrelid" type Oid is added to pg_class.
> > It contains the Oid of the long-value relation or the
> > invalid Oid for those who have no LONG attributes.
>
> I have a mixed feeling about all these "sparse" fields everywhere. Doing
> it completely formally, this seems to be a one-to-many relation, so you
> should put the referencing field into the pg_long table or whatever
> structure you use, pointing the other way around. This is probably slower,
> but it's cleaner. As I mentioned earlier, this whole arrangement will
> (hopefully) not be needed for all too long, and then we wouldn't want to
> be stuck with it.

It's 4 bytes per RELATION in pg_class. As a side effect, the
information will be available at NO COST immediately after
heap_open() and in every place, where a relation is accessed.
So it is the best place to put it.

>
> > - At CREATE TABLE, a long value relation named
> > "_LONG<tablename>" is created for those tables who need it.
>
> Please don't forget, this would require changes to pg_dump and psql. Also,
> the COPY command might not be able to get away without changes, either.

Oh yes, thanks. That was a point I forgot!

Psql must not list tables that begin with "_LONG" on the \d
request. Anything else should IMHO be transparent.

Pg_dump either uses a SELECT to build a script that INSERT's
the data via SQL, or uses COPY. In the SELECT/INSERT case, my
implementation would again be totally transparent and not
noticed by pg_dump, only that it must IGNORE "_LONG*"
relations and be aware that really big tuples can be sent,
but that's more a libpq question I think (what I already
checked because the view/rule/PL combo I created to
demonstrate a >128K tuple was done through psql). AFAIK,
pg_dump doesn't use a binary COPY, and looking at the code
tells me that this is transparent too (due to use of type
specific input/output function there).

All pg_dump would have to do is to ignore "_LONG*" relations
too.

The real problem is COPY. In the case of a COPY BINARY it
outputs the data portion of the fetched tuples directly. But
these will only contain the LongData headers, not the data
itself.

So at that point, COPY has to do the reverse process of
heap_insert(). Rebuild a faked tuple where all the not NULL
LONG values are placed in the representation, they would have
after type input. Not a big deal, must only be done with the
same care as the changes in heapam not to leave unfreed,
leaked memory around.

> In general, it wouldn't surprise me if some sections of the code would go
> nuts about the news of tuples longer than BLCKSZ coming along. (Where
> "nuts" is either 'truncation' or 'segfault'.)

The place, where the size of a heap tuple only is checked
(and where the "tuple size too big" message is coming from)
is in hio.c, right before it is copied into the block. Up to
then, a tuple is NOT explicitly limited to any size.

So I would be glad to see crashes coming up from this change
(not after release - during BETA of course). It would help us
to get another existing bug out of the code.

> I guess what I'm really saying is that I'd be totally in awe of you if you
> could get all of this (and RI) done by Feb 1st. Good luck.

Thank's for the flowers, but "awe" is far too much - sorry.

During the years I had my hands on nearly every part of the
code involved in this. So I'm not a newbe in creating data
types, utility commands or doing syscat changes. The LONG
type I described will be the work of two or three nights.

I already intended to tackle the long tuples next. Missing
was the idea how to AVOID it simply. And I had this idea just
while answering a question about storing big text files in
the database in the [SQL] list - that woke me up.

In contrast to the RI stuff, this time I don't expect any
bugs, because there are absolutely no side effects I noticed
so far. On the RI stuff, we discussed for weeks (if not
months) about tuple visibility during concurrent transactions
and I finally ran into exactly these problems anyway.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-12-11 18:32:46 Re: [HACKERS] LONG
Previous Message Tom Lane 1999-12-11 18:22:03 Re: [HACKERS] Re: Mirroring a DB