From: | David Rysdam <drysdam(at)ll(dot)mit(dot)edu> |
---|---|
To: | |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Date format for bulk copy |
Date: | 2004-10-13 17:43:00 |
Message-ID: | 416D6924.8090509@ll.mit.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Greg Stark wrote:
>David Rysdam <drysdam(at)ll(dot)mit(dot)edu> writes:
>
>
>
>>In my brute force port, I just bulk copied the date
>>fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY
>>HH:MI:SS:MSAM').
>>
>>
>
>
>
>>Again, I created a temporary table and did a decode(field, 'hex') to the
>>real table.
>>
>>
>
>This is the standard approach. You're rather lucky these are the only
>data representation changes you've had to do so far. I fear you'll run into
>more and more complex changes over time and trying to avoid the temporary
>table will get harder and harder.
>
>
>
No, I think I'm OK there. These are programmatically-generated values
and I've already been through them all once. Just the millisecond issue
and the hex binary issue AFAIK.
>If it were me I would consider processing the files in perl. It should be
>pretty easy to do both of these modifications very quickly.
>
>
>
Very quick and easy to do one time. A little trickier to handle in an
elegant, maintainable way for the dozens of data reloads I do every
month for GBs of data onto two different server types.
>If you really want to go with a custom C code then you might be able to just
>grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
>separate module and create new functions with modified names. Load it with
>CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain';
>
>Or maybe create the function as my_byteain in postgres and then update the
>catalog entries somehow. I'm not sure how to do that but it shouldn't be too
>hard. And it might make it easier to do the substitution for the data load and
>then undo the change afterwards.
>
>
>
Why not create a type and then define the load function to be the
equivalent of "decode('hex')"?
>Doing the same for timmestamp is a bit trickier but you could copy
>ParseDateTime from datetime.c as a static function for your module.
>
>Be careful though, test this out thoroughly on a test database. I'm not sure
>of all the impacts of altering the in/out functions for data types. I expect
>it would break pg_dump, for example. And I would worry about the statistics
>tables too.
>
>
>
This is kind of a hybrid of my suggestions and the problems are a hybrid
as well. :)
1) Just change the timestamp type so that it allows a ':' delimiter for
milliseconds. Potential problems: Other parts of the code won't expect
it. People don't want that.
2) Create a new type. Potential problem: Things like date ranges
probably wouldn't work anymore, since the server wouldn't know it's a
date now.
From | Date | Subject | |
---|---|---|---|
Next Message | Robby Russell | 2004-10-13 17:59:42 | Re: Level of replication support? |
Previous Message | David Rysdam | 2004-10-13 17:32:01 | Re: Date format for bulk copy |