Quick Links

Re: How to insert a bulk of data with unique-violations very fast

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: How to insert a bulk of data with unique-violations very fast
Date:	2010-06-03 23:03:37
Message-ID:	AANLkTim0X5AyDz6K7uPdwCwoqNNSOjAUEsVJ9maS8aO0@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

2010/6/1 Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de>:
> Hello,
>
> i have a set of unique data which about 150.000.000 rows. Regullary i get a
> list of data, which contains multiple times of rows than the already stored
> one. Often around 2.000.000.000 rows. Within this rows are many duplicates
> and often the set of already stored data.
> I want to store just every entry, which is not within the already stored
> one. Also i do not want to store duplicates. Example:
>
> Already stored set:
> a,b,c
>
> Given set:
> a,b,a,c,d,a,c,d,b
>
> Expected set after import:
> a,b,c,d
>
> I now looking for a faster way for the import. At the moment i import the
> new data with copy into an table 'import'. then i remove the duplicates and
> insert every row which is not already known. after that import is truncated.
>
> Is there a faster way? Should i just insert every row and ignore it, if the
> unique constrain fails?
>
> Here the simplified table-schema. in real life it's with partitions:
> test=# \d urls
> Tabelle »public.urls«
> Spalte | Typ | Attribute
> --------+---------+-------------------------------------------------------
> url_id | integer | not null default nextval('urls_url_id_seq'::regclass)
> url | text | not null
> Indexe:
> »urls_url« UNIQUE, btree (url)
> »urls_url_id« btree (url_id)
>
> Thanks for every hint or advice! :)

I think you need to have a look at pgloader. It does COPY with error
handling. very effective.

http://pgloader.projects.postgresql.org/

>
> Greetings from Germany,
> Torsten
> --
> http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
> verschiedenen Datenbanksystemen abstrahiert,
> Queries von Applikationen trennt und automatisch die Query-Ergebnisse
> auswerten kann.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

In response to

How to insert a bulk of data with unique-violations very fast at 2010-06-01 15:03:48 from Torsten Zühlsdorff

Responses

Re: How to insert a bulk of data with unique-violations very fast at 2010-06-06 12:05:44 from Torsten Zühlsdorff

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Anj Adu	2010-06-04 01:45:30	slow query
Previous Message	Anj Adu	2010-06-03 20:37:23	Re: slow query performance