Re: postgres pg_bulkload c filter function in c programming

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Jan de Visser <jan(at)de-visser(dot)net>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, rajmhn <rajmhn(dot)ram(at)gmail(dot)com>
Subject: Re: postgres pg_bulkload c filter function in c programming
Date: 2016-12-29 19:04:51
Message-ID: CA+bJJby3n6pb6MccRVDvMjE0QTOy46Kk+8vFz86vtQSW6hY5+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi:

On Thu, Dec 29, 2016 at 3:01 PM, Jan de Visser <jan(at)de-visser(dot)net> wrote:
> On Thursday, December 29, 2016 5:10:08 AM EST rajmhn wrote:
...
>> I'm new to C. Gone through this documentation. Not clear, how to start.
...
> It seems to me it would be much easier to load the data into a temporary
> table, and from there transform into the form you need it to be. If you're not
> experienced in C (and probably even if you are) you can import *a lot* of data
> in the time it would take you to develop that custom filter.
> Besides, your requirements don't appear to be that difficult.

for his type of requirements I would recommend going the perl ( any
similar language will do, but after using it for 25 years I find it is
the king for that ) + pipe route. I mean, write a script which spits
"copy xxx from stdin\n" plus the transformed rows, pipe it trough
psql.

Total time is difficult to beat, as debugging is very easy, write the
filtering function using <> - chomp - split - s/// - join - print and
debug it by feeding it some lines with head.

Untested code from which I remembter of the specs, could be something like:

print "Copy my_table from stdin;\n";
my $e = "The magic col-e default value";
while(<>) {
chomp; # get rid of OPTIONAl line terminator
# Get the columns and do the c-d column swap
my ($a,$b,$d,$c)=split /\|/, $_;
# Add dashed to the date in column b:
$b =~ s/^(\d\d\d\d)(\d\d)(\d\d)$/$1-$2-$3/;
# zap not numerics a to 0:
($a=~/^\d+$/) or ($a = 0);
# And send it with the default separators ( scaping left as an
exercise to the reader, ask if needed, I've wrtten and tested it
several times ).
print join("\t", $a,$b,$c,$d,$e),"\n";
}

Test it with "head the-datafile | perl the_script.pl |
my_favourite_pager" until correct, the beauty of this approache is
you do not touch the db in debug, feed it to psql when done. In my
experience the perl script overhead is unnoticeable in any 2k+ machine
(and perl was specifically dessigned to be good at this kind of things
).

Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message rajmhn 2016-12-29 19:41:09 Re: postgres pg_bulkload c filter function in c programming
Previous Message Gerhard Wiesinger 2016-12-29 18:24:51 Re: vacuumdb --analyze-only scans all pages?