From: | Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: How to insert a bulk of data with unique-violations very fast |
Date: | 2010-06-09 07:45:46 |
Message-ID: | hungra$vlt$2@news.eternal-september.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Pierre C schrieb:
>
>> Within the data to import most rows have 20 till 50 duplicates.
>> Sometime much more, sometimes less.
>
> In that case (source data has lots of redundancy), after importing the
> data chunks in parallel, you can run a first pass of de-duplication on
> the chunks, also in parallel, something like :
>
> CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;
>
> or you could compute some aggregates, counts, etc. Same as before, no
> WAL needed, and you can use all your cores in parallel.
>
> From what you say this should reduce the size of your imported data by
> a lot (and hence the time spent in the non-parallel operation).
Thank you very much for this advice. I've tried it inanother project
with similar import-problems. This really speed the import up.
Thank everyone for your time and help!
Greetings,
Torsten
--
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse
auswerten kann.
From | Date | Subject | |
---|---|---|---|
Next Message | Pierre C | 2010-06-09 10:51:08 | Re: How to insert a bulk of data with unique-violations very fast |
Previous Message | Pierre C | 2010-06-07 16:52:07 | Re: How to insert a bulk of data with unique-violations very fast |