Re: update faster way

From: Muhammad Usman Khan <usman(dot)k(at)bitnine(dot)net>
To: yudhi s <learnerdatabase99(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: update faster way
Date: 2024-09-17 05:36:50
Message-ID: CAPnRvGs0VBL=2Huz4qChHdh+wKO-CuimrKzZt3xFiRHxkAc5vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism

SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;

On Sat, 14 Sept 2024 at 08:22, yudhi s <learnerdatabase99(at)gmail(dot)com> wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Muhammad Usman Khan 2024-09-17 05:55:24 Re: load fom csv
Previous Message veem v 2024-09-17 03:55:59 Re: IO related waits