From: | Vitalii Tymchyshyn <vit(at)tym(dot)im> |
---|---|
To: | Bill Moran <wmoran(at)potentialtech(dot)com> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, John R Pierce <pierce(at)hogranch(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org> |
Subject: | Re: JDBC behaviour |
Date: | 2016-02-20 17:34:39 |
Message-ID: | CABWW-d3kagq=8EnXFRzc7Jzij08DNY0VbdT=ZcSa48rnuQXOYg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers pgsql-jdbc |
Well, often in ETL it's not MY crappy data. It's something received from
upstream system and my task is to perform a load, not stop on the first
error and produce at least meaninful report on data crappyness statistics.
Extended cases may involve something like manual data fixing for error rows.
Just a real world example: in financial transactions sometimes you can
receive a transaction on a financial instrument that is brand new and is
not in your db yet. You don't want to fail the whole batch.
And yes, globally it's a move from "getting data" to "filtering this
[crappy] data ocean" going on.
Best regards, Vitalii Tymchyshyn
Сб, 20 лют. 2016 12:09 Bill Moran <wmoran(at)potentialtech(dot)com> пише:
> On Sat, 20 Feb 2016 16:29:09 +0000
> Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
>
> > Well, I suppose replacing simple copy with procedural per-row function
> > would give huge performance hit. Also what method do you propose to use
> in
> > the code? Savepoints?
>
> Not at all. PL/PGSQL's ON ERROR handling can manage this without needing
> savepoints.
>
> > I'd say this would also add a more slowdown.
>
> What? The savepoints? Well, you don't need them. The stored
> procedure is going to incur a bit of a hit, though.
>
> > Also quite a bit of boilerplate code would be needed. It's similar to
> merge
> > statement. Yes, it can be achieved in pure SQL, but having clean merge
> > statement saves you a lot of headache and is usually much more efficient.
> > Basically, it's not that what OP needs is not doable at all, it's that
> > other RDBMs often has this functionality in much more convenient and
> > performance optimized way.
>
> True. I don't think "clean up my crappy data" has ever been a
> priority for PostgreSQL. Although, "allow the user to build whatever
> is needed" has been.
>
> I find it curious that those of us who become stewards of other people's
> data find ourselves bending over backwards to try to clean up their
> garbage data. It's an interesting social commentary on how software
> design has changed since the term GIGO was in common use.
>
> > Best regards, Vitalii Tymchyshyn
> >
> > ??, 20 ???. 2016 11:16 Bill Moran <wmoran(at)potentialtech(dot)com> ????:
> >
> > > On Sat, 20 Feb 2016 16:01:04 +0000
> > > Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
> > >
> > > > Well, it OT here and belongs to -hackers, but as for me main use case
> > > here
> > > > is ETL or ELT process getting a lot of unvalidated external data.
> > > > And a good option to solve this problem is not to change transaction
> > > > semantics or slow down processing by adding tons of savepoints, but
> add
> > > "on
> > > > error" clause to insert/copy statement.
> > > >
> > > > This clause should allow to save records that can't fit into
> destination
> > > > table because of type, check of referential constaints into error
> table.
> > > > Oracle has similar functionality and we are using it in our project.
> No
> > > > error is generated - no transaction rollback, batch abort or similar.
> > > >
> > > > As for me it would cover 90% of use cases and would be really
> usefull.
> > > The
> > > > one problem I can see is with inserting into partition parent.
> > >
> > > PL/PGSQL provides this functionality. It requires the creation of a
> server
> > > side function and using that function to insert data, but it can do
> exactly
> > > what you're describing.
> > >
> > > While adding other mechanisms to make it "easier" or "more like some
> other
> > > software" might be valuable; the simple fact is that Postgres _does_
> > > support
> > > what you want. The fact that you're not aware of it doesn't change
> that.
> > >
> > > --
> > > Bill Moran
> > >
>
>
> --
> Bill Moran
>
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2016-02-20 17:41:20 | Re: JDBC behaviour |
Previous Message | Tom Lane | 2016-02-20 17:31:38 | Re: JDBC behaviour |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2016-02-20 17:41:20 | Re: JDBC behaviour |
Previous Message | Tom Lane | 2016-02-20 17:31:38 | Re: JDBC behaviour |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2016-02-20 17:41:20 | Re: JDBC behaviour |
Previous Message | Tom Lane | 2016-02-20 17:31:38 | Re: JDBC behaviour |