Re: UPDATE many records

From: Justin <zzzzz(dot)graf(at)gmail(dot)com>
To: Israel Brewster <ijbrewster(at)alaska(dot)edu>, "pgsql general (pgsql-general(at)postgresql(dot)org)" <pgsql-general(at)postgresql(dot)org>
Subject: Re: UPDATE many records
Date: 2020-01-06 19:05:04
Message-ID: CALL-XeMp8x8JNdpf4VsOqh_AgLHe8ZQd_P2psRGd4_9RAKQewg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

There are several ways to actually do this

If you have Postgresql 11 or higher we now have Create Procedure that
allows committing transactions, one draw back is it can not parallel from
inside the procedure
https://www.postgresql.org/docs/11/sql-createprocedure.html
https://severalnines.com/database-blog/overview-new-stored-procedures-postgresql-11

If its an older version then Python Script or other scripting language to
iterates over the data say 10 to 20K will do what you want

for i in list of IDs
begin ;
"UPDATE table_name SET changed_field=new_value() where ID @> int4range(i,
i+10000);
commit;

To create parallel process simple Python script or other scripting language
can be used to create many connections working the data in parallel but
given the simple update it will NOT help in performance, this will be
Hard disk IO bound, not process bound where parallelization helps

On Mon, Jan 6, 2020 at 1:36 PM Israel Brewster <ijbrewster(at)alaska(dot)edu>
wrote:

> Thanks to a change in historical data, I have a need to update a large
> number of records (around 50 million). The update itself is straight
> forward, as I can just issue an "UPDATE table_name SET
> changed_field=new_value();" (yes, new_value is the result of a stored
> procedure, if that makes a difference) command via psql, and it should
> work. However, due to the large number of records this command will
> obviously take a while, and if anything goes wrong during the update (one
> bad value in row 45 million, lost connection, etc), all the work that has
> been done already will be lost due to the transactional nature of such
> commands (unless I am missing something).
>
> Given that each row update is completely independent of any other row, I
> have the following questions:
>
> 1) Is there any way to set the command such that each row change is
> committed as it is calculated?
> 2) Is there some way to run this command in parallel in order to better
> utilize multiple processor cores, other than manually breaking the data
> into chunks and running a separate psql/update process for each chunk?
> Honestly, manual parallelizing wouldn’t be too bad (there are a number of
> logical segregations I can apply), I’m just wondering if there is a more
> automatic option.
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory
> Geophysical Institute - UAF
> 2156 Koyukuk Drive
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell: 907-328-9145
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-01-06 19:07:14 Re: UPDATE many records
Previous Message Israel Brewster 2020-01-06 18:36:22 UPDATE many records