From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: How to insert a bulk of data with unique-violations very fast |
Date: | 2010-06-03 20:09:38 |
Message-ID: | AANLkTilj2xHAZMnQ-To3lod3mMPw1DBU_8EkRwTC29qb@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, Jun 3, 2010 at 11:19 AM, Torsten Zühlsdorff
<foo(at)meisterderspiele(dot)de> wrote:
> Scott Marlowe schrieb:
>>
>> On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff
>> <foo(at)meisterderspiele(dot)de> wrote:
>>>
>>> Hello,
>>>
>>> i have a set of unique data which about 150.000.000 rows. Regullary i get
>>> a
>>> list of data, which contains multiple times of rows than the already
>>> stored
>>> one. Often around 2.000.000.000 rows. Within this rows are many
>>> duplicates
>>> and often the set of already stored data.
>>> I want to store just every entry, which is not within the already stored
>>> one. Also i do not want to store duplicates. Example:
>>
>> The standard method in pgsql is to load the data into a temp table
>> then insert where not exists in old table.
>
> Sorry, i didn't get it. I've googled some examples, but no one match at my
> case. Every example i found was a single insert which should be done or
> ignored, if the row is already stored.
>
> But in my case i have a bulk of rows with duplicates. Either your tipp
> doesn't match my case or i didn't unterstand it correctly. Can you provide a
> simple example?
create table main (id int primary key, info text);
create table loader (id int, info text);
insert into main values (1,'abc'),(2,'def'),(3,'ghi');
insert into loader values (1,'abc'),(4,'xyz');
select * from main;
id | info
----+------
1 | abc
2 | def
3 | ghi
(3 rows)
select * from loader;
id | info
----+------
1 | abc
4 | xyz
(2 rows)
insert into main select * from loader except select * from main;
select * from main;
id | info
----+------
1 | abc
2 | def
3 | ghi
4 | xyz
(4 rows)
Note that for the where not exists to work the fields would need to be
all the same, or you'd need a more complex query. If the info field
here was different you'd get an error an no insert / update. For that
case you might want to use "where not in":
insert into main select * from loader where id not in (select id from main);
If you wanted the new rows to update pre-existing rows, then you could
run an update first where the ids matched, then the insert where no id
matches.
From | Date | Subject | |
---|---|---|---|
Next Message | Anj Adu | 2010-06-03 20:37:23 | Re: slow query performance |
Previous Message | Kevin Grittner | 2010-06-03 20:01:03 | Re: Weird XFS WAL problem |