On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver(at)comcast(dot)net> wrote:
> Define easily.
~
OK, let me try to outline the approach I would go for:
~
I think "COPY FROM CSV" should have three options, namely:
~
1) the way we have used it in which you create the table first
~
2) another way in which defaults are declared, generally as:
~
2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...
~
2.2) conservative: data type and value, but no formatting analysis is
done and the greater encompassing data type is selected, say for 1 or
0 data use bytes [0, 255], for bytes use int, if something could be
encoded as char(2), use varchar instead, . . .
~
2.3) dumn: just use the coarsest data type possible; bigint for
anything that looks like a number and varchar for the rest
~
the "dumn" option should suggest to the DBA the option they are
using, quantified consequences for their desicions (larger DBs for no
reason, approx. reduction in speed, . .) and how not to be "dumn"
~
3) or you could define "import templates" declaring which specific
data types to use for data in a certain way, which could be declared
per column using regexps
~
> I could go on, but the point is that table data types require some thought on the part of the DBA.
~
Well, it still requires their minds and input, but they will have
jobs even if they get some help, don't you think so ;-)
~
lbrtchx