Re: Batch update million records in prd DB

From: Yi Sun <yinan81(at)gmail(dot)com>
To: Kristjan Mustkivi <sonicmonkey(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Batch update million records in prd DB
Date: 2021-03-02 10:43:22
Message-ID: CABWY_HCWdaRxo_pqWxt1Qn-1gHMBUv5J2sp_Sa9svrdM9YZXsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Kristjan,

Thank you for this information.

"postgres the memory is slowly eaten away when doing updates within plsql
loop" for this memory issue, I want to check if it exists in our current
postgresql version. And let developer change to use python for loop also
need to show them the proof, how to reproduce and check the memory slowly
eaten away please? OS level cmd or psql cmd to verify? thanks

Kristjan Mustkivi <sonicmonkey(at)gmail(dot)com> 于2021年3月2日周二 下午5:26写道:

> Hi Yi,
>
> I found that in postgres the memory is slowly eaten away when doing
> updates within plsql loop. It only gets released once the whole block
> completes. While it is ok for small tables you will eventually run out
> of memory for really big ones. The working approach was to do the loop
> in e.g a python script that called the DML statements and also called
> commit. Several million rows is fortunately relatively small number to
> update but once you get to billions this approach would not likely
> work. Note that after each batch you also should call VACUUM before
> starting a new one to avoid significant table bloat.
>
> BR,
>
> Kristjan
>
> On Wed, Feb 24, 2021 at 3:01 PM Yi Sun <yinan81(at)gmail(dot)com> wrote:
> >
> > Hello,
> >
> > Now need to update several million records in a table in prd DB, if can
> use batch update 1000 records and commit each time, if it will affect prd
> application like below sample script please?
> >
> > Sample script:
> >
> > DO $MAIN$
> > DECLARE
> > affect_count integer;
> > chunk_size CONSTANT integer :=1000;
> > sleep_sec CONSTANT numeric :=0.1;
> > BEGIN
> >
> > loop
> >
> > exit when affect_count=0;
> >
> > UPDATE tbl a
> > SET name = ''
> > WHERE a.id IN (SELECT id
> > FROM tbl b
> > WHERE name IS NULL
> > LIMIT chunk_size);
> >
> > GET DIAGNOSTICS affect_count = ROW_COUNT;
> >
> > commit;
> >
> > PERFORM pg_sleep(sleep_sec);
> >
> > end loop;
> > END;
> > $MAIN$;
> >
> > Thanks and best regards
>
>
>
> --
> Kristjan Mustkivi
>
> Email: kristjan(dot)mustkivi(at)gmail(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Reid Thompson 2021-03-02 14:07:39 Re: Pgbackrest version 2.28 Bug/Issue
Previous Message Yi Sun 2021-03-02 10:36:21 Re: Batch update million records in prd DB