Re: Happy column adding (was RE: [HACKERS] Happy column dropping)

From: "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Happy column adding (was RE: [HACKERS] Happy column dropping)
Date: 2000-01-25 17:44:53
Message-ID: 20000125114453.E423@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 25, 2000 at 08:01:25AM -0800, Don Baccus wrote:
> At 11:55 AM 1/25/00 +0100, Peter Eisentraut wrote:
> >On Tue, 25 Jan 2000, Hiroshi Inoue wrote:
> >
> >> Even default is not allowed in ADD COLUMN now.
> >> There may be other reasons why they aren't allowed.
> >
> >It's not a matter of *allowed*, it's a parsing deficiency. The fact that
> >there was a default declared gets silently ignored. If y'all allow ;) I
> >would like to fix that (have already started a bit) by perusing the code
> >in parse_func.c:transformCreateStmt and do the same for the alter table
> >add column part. Maybe and add/drop constraint will come out in the end as
> >well.
>
> However, heap_getattr still won't see the default since it simply
> checks to see of the attribute number falls off the end of the
> tuple and then returns null.
>

Right, but that just means that existing tuples would not get the
default, but all new tuples would, right? So, while it does leave the
data inconsistent with the schema definition, it is fixable, and in a
controlled manner. A simple

UPDATE my_table SET new_field='default' WHERE new_field IS NULL;

should do it, right?

In fact, that's something I liked about the 'make invisible' strategy
for the ALTER DROP COLUMN case: it allows the DBA to control the
backends activity. If the DBA needs to drop a column from a large table,
but doesn't have space for 2X that table, what does she do? With the
invisible column, she could ALTER DROP, then do a series of updates,
similar to what Tom suggested:

UPDATE my_table SET otherfield=otherfield where table_id>0 and table_id<=100;
VACUUM mytable;
UPDATE my_table SET otherfield=otherfield where table_id>100 and table_id<=200;
VACUUM mytable;

etc.

Similarly, the aftermath of the ADD DEFAULT case could be handled in a
controlled manner, without forcing a 2X table size disk usage. I _like_
implementations that give the user (in this case, the DBA) control over
what happens, and when. I think this may answer Marc Fournier's desire
for a 'rewrite in place' version of these, since it would allow the
DBA, at their option, to update one tuple at a time (well, it'd be a royal
pain, but could be done...)

> There's no provision for then pulling out the default value and
> returning it instead.

Right, we don't want to special case it. What's wrong with returning a
NULL, for any tuple that hasn't been updated yet?

>
> I think this is why Tom was implying that add column should really
> alter the table?
>
> A fully-featured "add column" feature would be very nice, indeed.
>

I agree, where full-featured means adding constraints. I disagree that
rewriting the entire table is a good idea. This isn't even only an edge
case for admins with little disk space. I could very easily imagine
a schema chamge on existing data, where one has a 'flatfile' sort of
table in the tens of gigabytes range, and you want to remove a column,
in order to normalize the table (get rid of address2, zip2, state2,
..., for example). Requiring transient diskspace of 2X the table,
and a complete rewrite, _as each column is dropped_ whould be really
annoying. Yes, I know, dump/restore, but what about 24/7 systems? Hey,
it's dangerous to do surgery on a live system, but sometimes, you have
little choice.

I realize I just slipped over from the ADD COLUMN to the DROP COLUMN
case, but I think fundamentally, they're very similar, and will probably
get implemented with the same mechanism.

So Bruce, would the inviso-columns really be a mess in the source? We've
already got the 'virtual column' case, with the ADD COLUMN aftermath.

Perhaps Oliver's suggestion of removing the link between logical and
physical field ordering might help. That could lead to some interesting
storage optimizations, as well. Collecting all variable types to the
end of a tuple, for example, might be a win.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm(at)rice(dot)edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ross J. Reedstrom 2000-01-25 18:29:07 Re: Happy column adding (was RE: [HACKERS] Happy column dropping)
Previous Message Tom Lane 2000-01-25 17:39:25 Re: Happy column adding (was RE: [HACKERS] Happy column dropping)