From: | Muhammad Usman Khan <usman(dot)k(at)bitnine(dot)net> |
---|---|
To: | yudhi s <learnerdatabase99(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: update faster way |
Date: | 2024-09-17 05:36:50 |
Message-ID: | CAPnRvGs0VBL=2Huz4qChHdh+wKO-CuimrKzZt3xFiRHxkAc5vA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism
SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');
UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;
On Sat, 14 Sept 2024 at 08:22, yudhi s <learnerdatabase99(at)gmail(dot)com> wrote:
> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>
From | Date | Subject | |
---|---|---|---|
Next Message | Muhammad Usman Khan | 2024-09-17 05:55:24 | Re: load fom csv |
Previous Message | veem v | 2024-09-17 03:55:59 | Re: IO related waits |