Removing duplicates

From: Johann Spies <johann(dot)spies(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Removing duplicates
Date: 2013-07-10 09:11:11
Message-ID: CAGZ55DQmPeF1F_No2riywGU8iAs+Gy2=h_wfA38b2=ciM4JEfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have used this method successfully on another table but this one is not
working.

I have a table of nearly 800 million records with some duplicates in.

Here is an example:

select rart_id, r9, ra, ry, rw, rv, rp, rs, ri
from isi.rcited_ref
where rart_id = 'A1986D733500019';

renders a result of 72 records. When I do

select count(id), rart_id, r9, ra, ry, rw, rv, rp, rs, ri
from isi.rcited_ref
where rart_id = 'A1986D733500019'
group by rart_id, r9, ra, ry, rw, rv, rp, rs, ri;

It shows that there are 36 duplicates with this rart_id.

So as a test I did the following (the id-field is the primary key):

DELETE FROM isi.rcited_ref q
WHERE EXISTS(SELECT 1 FROM isi.rcited_ref q1
WHERE q1.id != q.id
AND q.rart_id = q1.rart_id
AND q.r9 = q1.r9
AND q.ra = q1.ra
AND q.ry = q1.ry
AND q.rw = q1.rw
AND q.rv = q1.rv
AND q.rp = q1.rp
AND q.rs = q1.rs
AND q.rart_id = 'A1986D733500019'
);

But that deletes none. And I cannot see what went wrong.

I have also tried the same query with ctid without success:

DELETE FROM isi.rcited_ref q
WHERE EXISTS(SELECT 1 FROM isi.rcited_ref q1
WHERE q1.ctid < q.ctid
AND q.rart_id = q1.rart_id
AND q.r9 = q1.r9
AND q.ra = q1.ra
AND q.ry = q1.ry
AND q.rw = q1.rw
AND q.rv = q1.rv
AND q.rp = q1.rp
AND q.rs = q1.rs
AND q.rart_id = 'A1986D733500019'
);

The size of the table makes it difficult to use a 'group by' method to
delete all duplcates.

What am I doing wrong?

Regards
Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Muhammad Bashir Al-Noimi 2013-07-10 09:59:27 Re: Force ssl connection
Previous Message Michael Paquier 2013-07-10 06:14:34 Re: Support for Foreign keys with arrays