Quick Links

Re: How to do faster DML

From:	"Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: How to do faster DML
Date:	2024-02-03 20:29:47
Message-ID:	20240203202947.u45hpyiwzuodbav2@hjp.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2024-02-03 19:25:12 +0530, Lok P wrote:
> Apology. One correction, the query is like below. I. E filter will be on on
> ctid which I believe is equivalent of rowid in oracle and we will not need the
> index on Id column then.
>
> But, it still runs long, so thinking any other way to make the duplicate
> removal faster?
>
> Also wondering , the index creation which took ~2.5hrs+ , would that have been
> made faster any possible way by allowing more db resource through some session
> level db parameter setting?
>
> create table TAB1_New
> as
> SELECT * from TAB1 A
> where CTID in
> (select min(CTID) from TAB1
> group by ID having count(ID)>=1 );

That »having count(ID)>=1« seems redundant to me. Surely every id which
occurs in the table occurs at least once?

Since you want ID to be unique I assume that it is already almost
unique - so only a small fraction of the ids will be duplicates. So I
would start with creating a list of duplicates:

create table tab1_dups as
select id, count(*) from tab1 group by id having count(*) > 1;

This will still take some time because it needs to build a temporary
structure large enough to hold a count for each individual id. But at
least then you'll have a much smaller table to use for further cleanup.

In response to

Re: How to do faster DML at 2024-02-03 13:55:12 from Lok P

Browse pgsql-general by date

	From	Date	Subject
Next Message	Lok P	2024-02-03 20:44:20	Re: How to do faster DML
Previous Message	Francisco Olarte	2024-02-03 19:20:01	Re: How to do faster DML