Re: JDBC behaviour

From: Vitalii Tymchyshyn <vit(at)tym(dot)im>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, John R Pierce <pierce(at)hogranch(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: JDBC behaviour
Date: 2016-02-20 17:34:39
Message-ID: CABWW-d3kagq=8EnXFRzc7Jzij08DNY0VbdT=ZcSa48rnuQXOYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-jdbc

Well, often in ETL it's not MY crappy data. It's something received from
upstream system and my task is to perform a load, not stop on the first
error and produce at least meaninful report on data crappyness statistics.
Extended cases may involve something like manual data fixing for error rows.

Just a real world example: in financial transactions sometimes you can
receive a transaction on a financial instrument that is brand new and is
not in your db yet. You don't want to fail the whole batch.

And yes, globally it's a move from "getting data" to "filtering this
[crappy] data ocean" going on.

Best regards, Vitalii Tymchyshyn

Сб, 20 лют. 2016 12:09 Bill Moran <wmoran(at)potentialtech(dot)com> пише:

> On Sat, 20 Feb 2016 16:29:09 +0000
> Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
>
> > Well, I suppose replacing simple copy with procedural per-row function
> > would give huge performance hit. Also what method do you propose to use
> in
> > the code? Savepoints?
>
> Not at all. PL/PGSQL's ON ERROR handling can manage this without needing
> savepoints.
>
> > I'd say this would also add a more slowdown.
>
> What? The savepoints? Well, you don't need them. The stored
> procedure is going to incur a bit of a hit, though.
>
> > Also quite a bit of boilerplate code would be needed. It's similar to
> merge
> > statement. Yes, it can be achieved in pure SQL, but having clean merge
> > statement saves you a lot of headache and is usually much more efficient.
> > Basically, it's not that what OP needs is not doable at all, it's that
> > other RDBMs often has this functionality in much more convenient and
> > performance optimized way.
>
> True. I don't think "clean up my crappy data" has ever been a
> priority for PostgreSQL. Although, "allow the user to build whatever
> is needed" has been.
>
> I find it curious that those of us who become stewards of other people's
> data find ourselves bending over backwards to try to clean up their
> garbage data. It's an interesting social commentary on how software
> design has changed since the term GIGO was in common use.
>
> > Best regards, Vitalii Tymchyshyn
> >
> > ??, 20 ???. 2016 11:16 Bill Moran <wmoran(at)potentialtech(dot)com> ????:
> >
> > > On Sat, 20 Feb 2016 16:01:04 +0000
> > > Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
> > >
> > > > Well, it OT here and belongs to -hackers, but as for me main use case
> > > here
> > > > is ETL or ELT process getting a lot of unvalidated external data.
> > > > And a good option to solve this problem is not to change transaction
> > > > semantics or slow down processing by adding tons of savepoints, but
> add
> > > "on
> > > > error" clause to insert/copy statement.
> > > >
> > > > This clause should allow to save records that can't fit into
> destination
> > > > table because of type, check of referential constaints into error
> table.
> > > > Oracle has similar functionality and we are using it in our project.
> No
> > > > error is generated - no transaction rollback, batch abort or similar.
> > > >
> > > > As for me it would cover 90% of use cases and would be really
> usefull.
> > > The
> > > > one problem I can see is with inserting into partition parent.
> > >
> > > PL/PGSQL provides this functionality. It requires the creation of a
> server
> > > side function and using that function to insert data, but it can do
> exactly
> > > what you're describing.
> > >
> > > While adding other mechanisms to make it "easier" or "more like some
> other
> > > software" might be valuable; the simple fact is that Postgres _does_
> > > support
> > > what you want. The fact that you're not aware of it doesn't change
> that.
> > >
> > > --
> > > Bill Moran
> > >
>
>
> --
> Bill Moran
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2016-02-20 17:41:20 Re: JDBC behaviour
Previous Message Tom Lane 2016-02-20 17:31:38 Re: JDBC behaviour

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Kellerer 2016-02-20 17:41:20 Re: JDBC behaviour
Previous Message Tom Lane 2016-02-20 17:31:38 Re: JDBC behaviour

Browse pgsql-jdbc by date

  From Date Subject
Next Message Thomas Kellerer 2016-02-20 17:41:20 Re: JDBC behaviour
Previous Message Tom Lane 2016-02-20 17:31:38 Re: JDBC behaviour