Re: update faster way

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: yudhi s <learnerdatabase99(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: update faster way
Date: 2024-09-14 17:23:31
Message-ID: 37bc4217078946546b62c6ca2f2e458b798b1f36.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 2024-09-14 at 16:10 +0530, yudhi s wrote:
> > However, the strain on your system's resources and particularly the row
> > locks will impair normal database work.
> >
> > Essentially, you can either take an extended down time or perform the updates
> > in very small chunks with a very low "lock_timeout" over a very long period
> > of time.  If any of the batches fails because of locking conflicts, it has
> > to be retried.
> >
> > Investigate with EXPLAIN (ANALYZE) why the updates take that long.  It could
> > be a lame disk, tons of (unnecessary?) indexes or triggers, but it might as
> > well be the join with the lookup table, so perhaps there is room for
> > improvement (more "work_mem" for a hash join?)
>
> We have mostly insert/update happen on current day/live partition. So
> considering that, if we will run batch updates(with batch size of 1000) from
> five different sessions in parallel on different historical partition, at any
> time they will lock 5000 rows and then commit. And also those rows will not
> collide with each other. So do you think that approach can anyway cause locking
> issues?

The updates won't lock with each other. I thought that other database activity
might modify rows in these partitions. If that is not the case, you don't need
to worry about locks.

In that case I would also choose a much higher batch size.

You should make sure to back off every now and then and VACUUM the partition,
so that you avoid excessive table bloat.

> We will ensure the update of live partition occurs when we have least activity.
> So in that way we will not need extended down time. Please correct me if wrong.

That sounds right.

> Never used lock_timeout though, but in above case do we need lock_timeout?

It can be useful if your updating process is blocked by a lock from the
application. Setting the parameter to a low value will keep your update
from hanging for a long time and will throw an error instead.
Erroring out early reduces the danger of a deadlock.

> Regarding batch update with batch size of 1000, do we have any method exists
> in postgres (say like forall statement in Oracle) which will do the batch dml.
> Can you please guide me here, how we can do it in postgres.

I would certainly not perform the update row for row in PL/pgSQL code.
Perhaps something like this:

DO
$$DECLARE
i bigint;
BEGIN
FOR i IN 1..1000000 by 100000 LOOP
UPDATE tab SET ...
WHERE id >= i AND id < i + 100000;

COMMIT;
END LOOP;
END;$$;

VACUUM tab;

Then repeat for the next million rows, and so on.

> And yes will need to see what happens in the update using explain analyze.
> And I was trying to see, if we can run explain analyze without doing
> actual update , but seems that is not possible. 

You can do it in a transaction and roll the transaction back.

Yours,
Laurenz Albe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-09-14 17:29:06 Re: Reg: Size difference
Previous Message Vinay Oli 2024-09-14 17:19:07 Reg: Size difference