From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Miguel Miranda <miguel(dot)mirandag(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: delete duplicates takes too long |
Date: | 2009-04-24 23:50:23 |
Message-ID: | dcc563d10904241650h762c7e4cpd4401ad156773fed@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Apr 24, 2009 at 5:37 PM, Miguel Miranda
<miguel(dot)mirandag(at)gmail(dot)com> wrote:
> hi , i hava a table:
> CREATE TABLE public.cdr_ama_stat (
> id int4 NOT NULL DEFAULT nextval('cdr_ama_stat_id_seq'::regclass),
> abonado_a varchar(30) NULL,
> abonado_b varchar(30) NULL,
> fecha_llamada timestamp NULL,
> duracion int4 NULL,
> puerto_a varchar(4) NULL,
> puerto_b varchar(4) NULL,
> tipo_llamada char(1) NULL,
> processed int4 NULL,
> PRIMARY KEY(id)
> )
> GO
> CREATE INDEX kpi_fecha_llamada
> ON public.cdr_ama_stat(fecha_llamada)
>
> there should be unique values for abonado_a, abonado_b, fecha_llamada,
> duracion in every row, googling around i found how to delete duplicates in
> postgresonline site ,
Then why not have a unique index on those rows together?
> so i run the following query (lets say i want to know how many duplicates
> exists for 2004-04-18, before delete them):
>
> SELECT * FROM cdr_ama_stat
> WHERE id NOT IN
> (SELECT MAX(dt.id)
> FROM cdr_ama_stat As dt
> WHERE dt.fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp +
> INTERVAL '1 day'
> GROUP BY dt.abonado_a, dt.abonado_b,dt.fecha_llamada,dt.duracion)
> AND fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp +
> INTERVAL '1 day'
>
> my problem is that the query take forever, number of rows:
Have you tried throwing more work_mem at the problem?
The other method to do this uses no group by but a join clause.
Depending on the number of dupes it can be faster or slow.
delete from table x where x.id in
(select a.id from table a jon table b on (a.somefield=b.somefield
and a.id < b.id))
Or something like that.
From | Date | Subject | |
---|---|---|---|
Next Message | Miguel Miranda | 2009-04-25 00:05:08 | Re: delete duplicates takes too long |
Previous Message | Miguel Miranda | 2009-04-24 23:37:22 | delete duplicates takes too long |