Quick Links

identifying duplicates in table with redundancies

From:	Tarlika Elisabeth Schmitz <postgresql(at)numerixtechnology(dot)de>
To:	pgsql-sql(at)postgresql(dot)org
Subject:	identifying duplicates in table with redundancies
Date:	2010-09-23 21:39:40
Message-ID:	20100923223940.5019475e@dick.coachhouse
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

I loaded data from a spread into a interim table so I can analyze the
quality of the data.

The table contains an entry for every student (250K records) and his
trainer. Eventually, I want to extract a unique list of trainers from
it. But first of all I want to check for duplicates:

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name

I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it.

CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);

====
EXAMPLE DATA

22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====

-- outputs trainer ids which appear under different names
select trainer_id, trainer_name from
(
-- different id/name combinations
select distinct on (trainer_name) trainer_id, trainer_name
from student
where trainer_id in
(
-- trainer ids with appearing with different names
select distinct on (id) id
from
(
-- distinct trainer id-name
select distinct on (trainer_id,trainer_name)
trainer_id as id,
trainer_name as name from student
) as trainer
group by trainer.id
having count (trainer.name) > 1
)
) as y
order by trainer_id

Best Regards,
Tarlika Elisabeth Schmitz

Responses

Re: identifying duplicates in table with redundancies at 2010-09-24 10:22:21 from Oliveiros d'Azevedo Cristina

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Nicholas I	2010-09-24 09:15:54	find and replace the string within a column
Previous Message	Rob Sargent	2010-09-23 18:46:55	Re: pg_config -less