Re: Copy Bulk Ignore Duplicated

From: Leandro Guimarães <leo(dot)guimaraes(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Copy Bulk Ignore Duplicated
Date: 2019-06-15 02:26:33
Message-ID: CAJV35FMvQQ65Pdb11WoU5wPSgGG+=+8_os-CyO6HwPS1Pe5h5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Adrian,
I'll take a look about pg_bulkload, but I populate the database via a
Java application with JDBC.

I'll try the query you kindly sent to me!

Thanks!
Leandro Guimarães

On Fri, Jun 14, 2019 at 6:59 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 6/14/19 2:04 PM, Leandro Guimarães wrote:
> > Hi,
> > I have a scenario with a large table and I'm trying to insert it via
> > a COPY command with a csv file.
> >
> > Everything works, but sometimes my source .csv file has duplicated
> > data in the previously fulfilled table. If I add a check constraint and
> > try to run the COPY command I have an error that stops the whole
> insertion.
> >
> > I've tried to put the data in a tmp table and fill the main using
> > distinct this way (the fields and names are just examples):
> >
> > INSERT INTO final_table values (name, document)
> > SELECT DISTINCT name, document
> > FROM tmp_TABLE t1
> > WHERE NOT EXISTS (
> > SELECT 1 FROM final_table t2
> > WHERE (t2.name <http://t2.name>, t2.document)
> > IS NOT DISTINCT FROM (t1.name <http://t1.name>, t1.document))
> >
> > The problem is that my final_table is a large (and partitioned) table
> > and this query is taking a long time to execute.
> >
> > Someone have any idea (really guys anything would be great) how to solve
> > this situation? I need to ignore duplicates instead to have some error.
>
> pg_bulkload?:
>
> https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
> "PARSE_ERRORS = n
> The number of ingored tuples that throw errors during parsing,
> encoding checks, encoding conversion, FILTER function, CHECK constraint
> checks, NOT NULL checks, or data type conversion. Invalid input tuples
> are not loaded and recorded in the PARSE BADFILE. The default is 0. If
> there are equal or more parse errors than the value, already loaded data
> is committed and the remaining tuples are not loaded. 0 means to allow
> no errors, and -1 and INFINITE mean to ignore all errors. "
>
> >
> > I'm using*PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is
> > not an option.
> >
> > Thanks and Kind Regards!
> >
> >
> > Leandro Guimarães
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jean Louis 2019-06-15 06:19:57 how to concat/concat_ws all fields without braces
Previous Message Leandro Guimarães 2019-06-15 02:24:37 Re: Copy Bulk Ignore Duplicated