| From: | George Nychis <gnychis(at)cmu(dot)edu> |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | insert only unique values in to a table, ignore rest? |
| Date: | 2007-01-08 20:58:24 |
| Message-ID: | 45A2B070.9090601@cmu.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Hi,
I have approximately 2 billion data entries that I would like to insert into a database.
Each entry consists of:
INT BOOLEAN INT BOOLEAN
I want to populate a table such that it only contains the unique rows, all other data
should be thrown out. I would say a significant amount of the insertions are going to
fail due to unique constraints. The unique constraint is on the two integers, not on the
booleans.
Using mysql, I was able to do this with the following query, for all data files (25574
data files total):
mysql -e \"use connectivity; LOAD DATA INFILE '/path/to/file' IGNORE INTO TABLE ipconn
FIELDS TERMINATED BY ' ';\"
What I *think* mysql did was sort each data file and do a sort of merge sort between the
data I was inserting and the data in the database. It would insert the first unique
instance of a row it saw, and reject all other insertions that violated the unique
constraint due to the "IGNORE".
From what I understand, this functionality is not in postgresql. Fine, I certainly can't
change that. But I am looking for a comparable solution for the size of my data.
One solution is to have a temporary table, insert all 2 billion rows, and then copy the
distinct entries to another table. This would be like one massive sort?
Is this the only/best solution using postgresql?
Thanks!
George
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Oisin Glynn | 2007-01-08 21:04:31 | Re: Database Failure on Windows XP Pro psql (PostgreSQL) |
| Previous Message | Brandon Aiken | 2007-01-08 20:56:06 | Re: Database Failure on Windows XP Pro psql (PostgreSQL) 8.1.4 |