Re: Truncate if exists

From: "Stafford, David x78061" <David(dot)Stafford(at)broadridge(dot)com>
To: "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Truncate if exists
Date: 2012-10-16 18:12:10
Message-ID: 6036D8D59B5D3B449F35AF3D871D010022DFFE12@JSCPCWEXMAA1.bsg.ad.adp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 15 Oct 2012 10:21:18 -0700, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Yeah, I think the functionality that we need is pretty much there
> already today. What we need to do is to get the syntax to a point
> where people can write the code they want to write without getting
> tangled up by it.
>
> I think the invention of DO was a big step in the right direction
> ...
> With DO, you can write the logic you want
> as an SQL statement, it's just a clunky and awkward SQL statement. In
> my view the goal ought to be to refine that mechanism to remove the
> clunkiness and awkwardness, rather than to invent something completely
> new.

As someone who has worked with a number of databases now, none of them really get this DDL integration completely right. What I'd like to see is 1) a predicate to easily test things about the schema (does this table, column, index, schema, etc. exist? does it have the right type?) and 2) a way to conditionally execute DDL (and DML, which should fall right out, but it isn't really what this discussion is covering). I would propose extending the current EXISTS / NOT EXISTS predicate as follows:

[NOT] EXISTS TABLE tab [ ( col [type] [ , col [type]]... ) ]
[NOT] EXISTS COLUMN tab.col [type]
[NOT] EXISTS INDEX tab.idx [ ( col [ , col]... ) ]
[NOT] EXISTS CONSTRAINT tab.cstrt -- not sure what else might be useful here
[NOT] EXISTS ( select ) -- this is the current EXISTS predicate, of course
[NOT] EXISTS ANY ROWS FROM tab [join] [WHERE predicate] [GROUP BY col [ , col]...] -- exactly the same as
-- (select 1 FROM etc.)
-- only because I like
-- it better
(the latter [which by no means am I nuts over; it's just that when extending EXISTS I can't stop neatening it up to my personal preferences] could be extended with [NOT] EXISTS MORE THAN n ROWS FROM and [NOT] EXISTS EXACTLY n ROWS FROM.)

There is a new SQL statement: IF predicate true-statement [ELSE false-statement].

To actually execute this new IF statement, the executor would need an IF node that evaluates the predicate (with ANDs and ORs, just like all SQL predicates) and then executes the rest only if the predicate came out TRUE (or NOT FALSE; I forget which is usually used, and the difference with NULL could be useful, as long as it matches other predicates). This moves one more bit of procedural logic into the executor.

Another wrinkle is that the dependent statement might not compile, due to missing tables or whatnot. Actually executing it while it doesn't compile is an error, but we want to defer that error until we actually decide we need to execute it. Also, it's probably good to try compiling it again at that point. So my thought would be to try planning the dependent statement(s); if they compile, hook them to the IF node; if not, hook a DEFERRED node to the IF node. The DEFERRED node has the parse tree (or raw string, whatever makes sense) of the statement; on execution it tries again to plan that statement; if it succeeds, run it; if not, error out.

I'd also add a SEQUENCE node to the executor. It just runs its children in order (could be n-ary, or if fixed arity nodes are what is in the planner/executor today, could be binary, first left, then right, and right could be another SEQUENCE). The DEFERRED node means that a CREATE statement could precede use of what is created in the same sequence and all could get planned (with some deferral to execution time) in advance and run in one lump. This implements DO at the executor level.

The biggest concepts left from plpgsql are looping and variables. Most variables could be modeled as a single row value; SQL already can update a row, so the planning of assignments and calculations of scalars (and arrays, I think) already fits into things the planner knows about. Table variables (which I don't know that plpgsql supports, but someday it should) are less defined. Adding plpgsql's loops to the executor would let whole functions run under one trip through the executor. This is beyond just improving the DDL support for scripts.

I have written a number of database upgrade scripts. Over time we've made them less fragile, by checking for the existence of tables, indexes, and most recently, columns. The usual sequence is:
1) check the existence of an index; check that the first few columns are correct; if not, drop the index
2) repeat for other indexes that have changed definition over time
3) check the existence of the table; create with current layout if it is missing
4) check for the presence of a column; if missing, alter the table to add it (of course, we can only add new columns at the end, and occasionally delete a column)
5) repeat for more columns
6) check the existence of an index; if missing, create it
7) repeat for all the indexes
This is doable in most databases, but pretty messy. You need to join with infoschema tables, or system tables, or use clunky functions to check for existence; checking types is usually pretty horrid. Consequently, we only check a few things and trust that the schema is only in a few different states. A true schema comparator and upgrade solver would be great, but I don't know anyone who has written such a thing. The extended EXISTS predicate that could check tables and indexes would declutter a lot of our upgrade scripts. That's the use case for me.

Much as I'd like to, I'm not volunteering to write this. And I'm not volunteering anyone else either, or demanding anything, or requesting or pleading. I did want to write it down and send it along just to clarify my thoughts. And if someday I get time, maybe I can learn enough internals to write a patch. But that day isn't today....

This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-10-16 19:39:42 Re: smgrsettransient mechanism is full of bugs
Previous Message Tom Lane 2012-10-16 17:29:09 Re: Global Sequences