RE: UPDATE on 20 Million Records Transaction or not?

From: Jason Ralph <jralph(at)affinitysolutions(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, Ganesh Korde <ganeshakorde(at)gmail(dot)com>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: UPDATE on 20 Million Records Transaction or not?
Date: 2020-06-23 20:54:06
Message-ID: BN7PR04MB3826DE611F9BCE758CFF79E9D0940@BN7PR04MB3826.namprd04.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>Just update them and be done with it. Do the work in batches if it doesn't matter that concurrent accesses to the table might see some >rows that have old value and some new. Given you are on PG11 and can commit within a function, you could adapt something like this to >just run until done. Oh, if you have not tuned the autovacuum process much, depending on the total number of rows in your table, it could >be good to manually vacuum once or twice during this process so that space is reused.

>*the below was shared by Andreas Krogh on one of these mailing lists about 6 months back.

>do $_$
>declare
> num_rows bigint;
>begin
> loop
> delete from YourTable where id in
> (select id from YourTable where id < 500 limit 100);
> commit;
> get diagnostics num_rows = row_count;
> raise notice 'deleted % rows', num_rows;
> exit when num_rows = 0;
> end loop;
> end;$_$;

Thanks to all the suggestions, I really like the function, I will test this. I have autovacuum fully tuned for this table, so should be good.

Jason Ralph

From: Michael Lewis <mlewis(at)entrata(dot)com>
Sent: Tuesday, June 23, 2020 1:43 PM
To: Ganesh Korde <ganeshakorde(at)gmail(dot)com>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>; pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: UPDATE on 20 Million Records Transaction or not?

>Are you updating every row in the table?
No I am using an update like so: UPDATE members SET regdate='2038-01-18' WHERE regdate='2020-07-07'
DB=# select count(*) from members where regdate = '2020-07-07';
count
----------
17333090
(1 row)

Just update them and be done with it. Do the work in batches if it doesn't matter that concurrent accesses to the table might see some rows that have old value and some new. Given you are on PG11 and can commit within a function, you could adapt something like this to just run until done. Oh, if you have not tuned the autovacuum process much, depending on the total number of rows in your table, it could be good to manually vacuum once or twice during this process so that space is reused.

*the below was shared by Andreas Krogh on one of these mailing lists about 6 months back.

do $_$
declare
num_rows bigint;
begin
loop
delete from YourTable where id in
(select id from YourTable where id < 500 limit 100);
commit;
get diagnostics num_rows = row_count;
raise notice 'deleted % rows', num_rows;
exit when num_rows = 0;
end loop;
end;$_$;
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Wolff, Ken L 2020-06-23 20:59:28 Re: Netapp SnapCenter
Previous Message Michael Lewis 2020-06-23 20:51:58 Re: Persistent Connections