From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
---|---|
To: | yudhi s <learnerdatabase99(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: update faster way |
Date: | 2024-09-14 17:23:31 |
Message-ID: | 37bc4217078946546b62c6ca2f2e458b798b1f36.camel@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sat, 2024-09-14 at 16:10 +0530, yudhi s wrote:
> > However, the strain on your system's resources and particularly the row
> > locks will impair normal database work.
> >
> > Essentially, you can either take an extended down time or perform the updates
> > in very small chunks with a very low "lock_timeout" over a very long period
> > of time. If any of the batches fails because of locking conflicts, it has
> > to be retried.
> >
> > Investigate with EXPLAIN (ANALYZE) why the updates take that long. It could
> > be a lame disk, tons of (unnecessary?) indexes or triggers, but it might as
> > well be the join with the lookup table, so perhaps there is room for
> > improvement (more "work_mem" for a hash join?)
>
> We have mostly insert/update happen on current day/live partition. So
> considering that, if we will run batch updates(with batch size of 1000) from
> five different sessions in parallel on different historical partition, at any
> time they will lock 5000 rows and then commit. And also those rows will not
> collide with each other. So do you think that approach can anyway cause locking
> issues?
The updates won't lock with each other. I thought that other database activity
might modify rows in these partitions. If that is not the case, you don't need
to worry about locks.
In that case I would also choose a much higher batch size.
You should make sure to back off every now and then and VACUUM the partition,
so that you avoid excessive table bloat.
> We will ensure the update of live partition occurs when we have least activity.
> So in that way we will not need extended down time. Please correct me if wrong.
That sounds right.
> Never used lock_timeout though, but in above case do we need lock_timeout?
It can be useful if your updating process is blocked by a lock from the
application. Setting the parameter to a low value will keep your update
from hanging for a long time and will throw an error instead.
Erroring out early reduces the danger of a deadlock.
> Regarding batch update with batch size of 1000, do we have any method exists
> in postgres (say like forall statement in Oracle) which will do the batch dml.
> Can you please guide me here, how we can do it in postgres.
I would certainly not perform the update row for row in PL/pgSQL code.
Perhaps something like this:
DO
$$DECLARE
i bigint;
BEGIN
FOR i IN 1..1000000 by 100000 LOOP
UPDATE tab SET ...
WHERE id >= i AND id < i + 100000;
COMMIT;
END LOOP;
END;$$;
VACUUM tab;
Then repeat for the next million rows, and so on.
> And yes will need to see what happens in the update using explain analyze.
> And I was trying to see, if we can run explain analyze without doing
> actual update , but seems that is not possible.
You can do it in a transaction and roll the transaction back.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2024-09-14 17:29:06 | Re: Reg: Size difference |
Previous Message | Vinay Oli | 2024-09-14 17:19:07 | Reg: Size difference |