Re: Removing duplicates

From: Ketana Patel <contactketana(at)yahoo(dot)co(dot)in>
To: Johann Spies <johann(dot)spies(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Removing duplicates
Date: 2013-07-10 18:04:48
Message-ID: 1373479488.45999.YahooMailNeo@web193801.mail.sg3.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

You can try following query on test database to remove duplicates.

DELETE FROM isi.rcited_ref a
    WHERE a.ctid <> (SELECT min(ctid) FROM isi.rcited_ref b
          WHERE a.rart_id = b.rart_id
          AND a.r9 = b.r9
          AND a.ra = b.ra
          AND a.ry = b.ry
          AND a.rw = b.rw
          AND a.rv = b.rv
          AND a.rp = b.rp
          AND a.rs = b.rs
          AND a.rart_id = 'A1986D733500019'
          );

Regards,
Ketana

________________________________
From: Johann Spies <johann(dot)spies(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Sent: Wednesday, 10 July 2013 5:11 AM
Subject: [GENERAL] Removing duplicates

I have used this method successfully on another table but this one is not working.

I have a table of nearly 800 million records with some duplicates in.

Here is an example:

select rart_id, r9, ra, ry, rw, rv, rp, rs, ri
from isi.rcited_ref
where rart_id = 'A1986D733500019';

renders a result of 72 records. When I do

select count(id), rart_id, r9, ra, ry, rw, rv, rp, rs, ri
from isi.rcited_ref
where rart_id = 'A1986D733500019'
group by rart_id, r9, ra, ry, rw, rv, rp, rs, ri;

It shows that there are 36 duplicates with this rart_id.

So as a test I did the following (the id-field is the primary key):

DELETE FROM isi.rcited_ref q
WHERE EXISTS(SELECT 1 FROM   isi.rcited_ref q1
   WHERE  q1.id != q.id
   AND    q.rart_id = q1.rart_id
   AND    q.r9 = q1.r9
   AND    q.ra = q1.ra
   AND    q.ry = q1.ry
   AND    q.rw = q1.rw
   AND    q.rv = q1.rv
   AND    q.rp = q1.rp
   AND    q.rs = q1.rs
   AND    q.rart_id = 'A1986D733500019'
    );

But that deletes none.  And I cannot see what went wrong.

I have also tried the same query with ctid without success:

DELETE FROM isi.rcited_ref q
WHERE EXISTS(SELECT 1 FROM   isi.rcited_ref q1
   WHERE  q1.ctid < q.ctid
   AND    q.rart_id = q1.rart_id
   AND    q.r9 = q1.r9
   AND    q.ra = q1.ra
   AND    q.ry = q1.ry
   AND    q.rw = q1.rw
   AND    q.rv = q1.rv
   AND    q.rp = q1.rp
   AND    q.rs = q1.rs
   AND    q.rart_id = 'A1986D733500019'
    );

The size of the table makes it difficult to use a 'group by'  method to delete all duplcates.

What am I doing wrong?

Regards
Johann

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian.Vondendriesch 2013-07-10 19:19:26 Re: Single Line Query Logging
Previous Message Kevin Grittner 2013-07-10 17:21:51 Re: function with unknown params