Quick Links

Copy Bulk Ignore Duplicated

From:	Leandro Guimarães <leo(dot)guimaraes(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Copy Bulk Ignore Duplicated
Date:	2019-06-14 21:04:35
Message-ID:	CAJV35FN-pL3-oYHVDpGUM0coD0zgcF8c+xgcAu+VmUs27JyCmg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I have a scenario with a large table and I'm trying to insert it via a
COPY command with a csv file.

Everything works, but sometimes my source .csv file has duplicated data
in the previously fulfilled table. If I add a check constraint and try to
run the COPY command I have an error that stops the whole insertion.

I've tried to put the data in a tmp table and fill the main using
distinct this way (the fields and names are just examples):

INSERT INTO final_table values (name, document)
SELECT DISTINCT name, document
FROM tmp_TABLE t1
WHERE NOT EXISTS (
SELECT 1 FROM final_table t2
WHERE (t2.name, t2.document)
IS NOT DISTINCT FROM (t1.name, t1.document))

The problem is that my final_table is a large (and partitioned) table and
this query is taking a long time to execute.

Someone have any idea (really guys anything would be great) how to solve
this situation? I need to ignore duplicates instead to have some error.

I'm using* PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is not
an option.

Thanks and Kind Regards!

Leandro Guimarães

Responses

Re: Copy Bulk Ignore Duplicated at 2019-06-14 21:59:02 from Adrian Klaver
Re: Copy Bulk Ignore Duplicated at 2019-06-14 22:05:05 from Adrian Klaver
Re: Copy Bulk Ignore Duplicated at 2019-06-14 22:38:49 from Tim Cross

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2019-06-14 21:59:02	Re: Copy Bulk Ignore Duplicated
Previous Message	Rob Sargent	2019-06-14 20:19:51	Re: vacuum/reindex