Quick Links

Re: COPY from .csv File and Remove Duplicates

From:	"David Johnston" <polobo(at)yahoo(dot)com>
To:	"'Rich Shepard'" <rshepard(at)appl-ecosys(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: COPY from .csv File and Remove Duplicates
Date:	2011-08-12 18:32:02
Message-ID:	004e01cc591e$219bc790$64d356b0$@yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

INSERT INTO chem_too
(lab_nbr, loc_name, sample_date, param, quant, units, qa_qc,
easting, northing, remark)
SELECT *
FROM chemistry
Natural Inner join (
SELECT loc_name, sample_date, param, Count(*) as duplicate_count
FROM chemistry
GROUP BY loc_name, sample_date, param) grouped
WHERE duplicate_count > 1;

psql -f cp_dups.sql nevada
psql:cp_dups.sql:10: ERROR: INSERT has more expressions than target columns
LINE 4: SELECT *
^
The INSERT INTO clause lists all columns in both tables so I fail to
understand to what the error message refers.

Rich

----------------------------------------------------------------------
The INSERT and SELECT portions of the query are independent; the column
listing in the INSERT does not affect the select. The only thing that
matters is that the DATA TYPE of the matching pairs are the same. For
instance:

INSERT INTO table1(field1, field2)
SELECT 'One' AS fieldA, 'Two'

Would work assuming that both field1 and field2 are text; fieldA gets
inserted into field1 and the unnamed second column gets inserted into
field2.

SELECT * --> means uses every column from the FROM/JOIN tables. Try issuing
the SELECT by itself and see what columns (and in what order) it returns.
In this case it will, at minimum, return a "duplicate_count" column which is
not going to be in the "chem_too" table. Thus, you need to replace the "*"
in the SELECT with the specific columns that correspond to the columns
listed in to INSERT portion of the query. Likely this means

INSERT INTO chem_too (lab_nbr, loc_name, sample_date, param, quant, units,
qa_qc, easting, northing, remark)
SELECT lab_nbr, loc_name, sample_date, param, quant, units, qa_qc, easting,
northing, remark -- in the original "*" expansion the duplicate_count field
is present AND quite possibly the order of the fields is messed up
FROM chemistry
NATURAL JOIN ( ... ) WHERE duplicate_count > 1;

IF the chemistry table is a true copy of the chem_too table you can shorten
the above in two ways; though using NATURAL JOIN may cause them to fail due
to column order. The above is the safest way to write the query -
explicitly specify all fields in both the INSERT and the SELECT portions of
the query.

INSERT INTO chem_too (lab_nbr, loc_name, sample_date, param, quant, units,
qa_qc, easting, northing, remark)
SELECT chemistry.*
FROM chemistry NATURAL JOIN ( ... ) WHERE duplicate_count > 1

OR, even further,

INSERT INTO chem_too
SELECT chemistry.*
FROM chemistry NATURAL JOIN ( ... ) WHERE duplicate_count > 1;

David J.

In response to

Re: COPY from .csv File and Remove Duplicates at 2011-08-12 18:03:15 from Rich Shepard

Responses

Re: COPY from .csv File and Remove Duplicates [RESOLVED] at 2011-08-12 22:46:58 from Rich Shepard

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2011-08-12 18:49:17	Re: PD_ALL_VISIBLE flag warnings
Previous Message	Vincent Veyron	2011-08-12 18:31:06	Re: Indicating DEFAULT values in INSERT statement