Re: Postgres SQL Syntax

From: "Ted Byers" <r(dot)ted(dot)byers(at)rogers(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres SQL Syntax
Date: 2007-02-09 21:31:15
Message-ID: 045801c74c91$a1911fc0$6701a8c0@RnDworkstation
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


----- Original Message -----
From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
>
> the open standard to convert data from one database to another,
> unfortunately, is SQL. SQL is incomplete, illogical, and obscure, so
> here we are.
>
The same can be said about any programming language, can it not?

Even languages as powerful and expressive as C++ and Perl have elements that
seem illogical. That they are incomplete is substantially demonstrated by
the existence of the boost library, most of which ought to be added to the
C++ standard library, and CPAN. But for perl, there is no standard so it is
hard to say definitively where the language ends and developer's libraries
begin. I am sure that all of the programming students I have taught have
found C++ template metaprogramming obscure and very difficult to fathom, but
I don't think it is possible at this time to beat Perl, and especially its
object model, for obscurity. I will concede, though, that this impression
may be an artifact of my being used to the object models in C++ and Java,
and that therefore the perl object model just appears to me to be a bit
weird. For the same reason, I find some aspects of SQL difficult to fathom.
That may be because I am not as familiar with set theory as I am with
algebra and calculus and analytic geometry, or it may be an artifact of the
languages with which I am most comfortable. I don't know if you can do
statistical analyses, such as nonlinear least squares, time series analysis
r chi-squared tests within SQL, but when faced with such a problem I tend to
follow the path of least resistance and export the data from the database
into my client code and do whatever analysis I need using my C++ or Java
code. But it there is a lot of data, I suspect there would be much less
impact on network traffic, and probably better performance, if the analysis
could be done in a SQL stored procedure. One of the more common analyses I
need to do involves time series analysis, sometimes with some kind of
smoothing (such as a moving average) applied before the analysis proper.

> moving data from mysql to postgresql is easy...its the table schemas
> that are tough. If you have the table schemas done, you can
>

When I design my databases, I normally assume that I may have to migrate the
data from one RDBMS profuct to another, for whatever reason. Therefore, I
have the ones I am most likely to have to support running here. I create a
SQL script to create my databases, aiming to use the highest common factor
across the RDBMS' SQL, and test it on each to verify that I succeeded in
finding the higest common factor. I have, then, a simple perl script to
execute the script, and it "knows" which tools to use based on a
configuration file that has the specifics for using MySQL or Postgresql or
MS SQL Server, and a commandline parameter that specifies which DB to use.

> mysqldump --compatible=postgresql | psql
>
> which should work for 90% of tables, because mysql supports only a
> subset of the types postgresql supports.
>
> the schemas are a bit trickier...you have to do them by hand or use a
> conversion tool. one such tool is DTS. if you look around you might
> find something else though.
>
There's many more than one way to skin a cat. My inclination is to take a
brute force, albeit brain-dead, approach. One of my irritants is that none
of the RDBMS products appear to support the same ways of importing data,
e.g. from a flat file (CSV files, for instance). But at least they all do
it in some way, and they all are able to export a table to, e.g. a csv file.
Therefore, it becomes trivially easy to move data, even in the absence of a
tool like DTS, by exporting the data to a CSV file and then reading that
file into the target RDBMS. This is something I've done countless times.
But, not having found a way to easily determine the schema programmatically,
so my perl or Java or C++ code can be generic enough to apply to any of my
databases, I find myself taking an hour or two to write a simple script,
usually in Perl, to handle each table in sequence. There are, of course,
things to watch, such as keeping the table creation statements in a
different script than that which creates indices and especially foreign
keys, so the tables are ready when the data is to be loaded, but the
constraints are created after the data is loaded, purely for performance
reasons. I learned the hard way that loading data slows down dramatically
if the database has to continually check constraints, so I create
constraints after loading valid data, and leave them in place only while new
data is to be stored.

I guess I am recommending that the schemas be done by hand, regardless of
whether you're creating a new database or repairing or migrating an old one,
even if the original developer wasn't considerate enough to create, or
provide, an appropriate SQL script to create the database de novo. it seems
to me to be risky to assume that the tools we like now will still exist ten
years from now. Anyone remember IBM's OS/2? That was my favourite OS, and
it was arguably better than any flavour of Windows available at the time.
But even though I used it exclusively ten years ago, it is no longer a
viable option for any machine I now use. I am sure anyone reading this who
is of my generation could produce a long list of products that they used and
liked which no longer exist or which are now no longer commercially viable.
C++ is still a good option for many kinds of application development, but I
don't think there are very many people using Watcom's compiler to produce
commercial applications. SQL will likely exist ten years from now, but will
our favourite RDBMS? I won't make that prediction, but I think I can
minimize the risk by creating SQL scripts that use the highest common factor
across the RDBMS products I have at my disposal. Therefore, I should be
able to use my scripts, with minimal pain, regardless of what RDBMS products
are available at that time.

I do much the same for my C++ coding. I try to use the highest common
factor in the language, as implemented by the suite of compilers I use, and
keep the implementation specific stuff to a minimum in completely separate
compilation units (and even in different directories). I just see it as
inevitable that significant refactoring will be required, especially when
migrating from one suite of tools to another, so I plan an architecture for
my code that should make it as easy as possible, and similarly, for moving
data around, I developed my approach to be as easy and reliable as possible,
even if it is not the most elegant or efficient.

Cheers,

Ted

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2007-02-10 00:45:26 Re: Postgres SQL Syntax
Previous Message Brian Wong 2007-02-09 19:40:08 Re: ldap auth problem