Re: How to avoid UPDATE performance degradation in a transaction

From: Karl Düüna <karl(dot)dyyna(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: How to avoid UPDATE performance degradation in a transaction
Date: 2020-02-14 07:14:57
Message-ID: CAMNADoYOe-sK9W8TnaaAC7rimqQGcktauWmJfddG5961EH5RMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you for the explanation.

That is pretty much what I suspected, but I held out hope that there is
some functionality I could use to clear the bloat as the transaction
progresses and bring the UPDATE time back down again.
"dont do that" is sensible, but much more easily said than done, as the in
the actual use case I have, the single row updates are caused by various
triggers running on separate operations -
which means I will have to muck about with conditional trigger disabling
and/or change a large part of the database logic around these tables.
But I guess that is a whole other issue.

Anyways thank you for the time and explanation,
Karl

On Thu, 13 Feb 2020 at 23:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?B?S2FybCBEw7zDvG5h?= <karl(dot)dyyna(at)gmail(dot)com> writes:
> > -- TL; DR;
> > UPDATE on a row takes relatively constant amount of time outside a
> > transaction block, but running UPDATE on a single row over and over
> inside
> > a transaction gets slower and slower as the number of UPDATE operations
> > increases.
>
> Yeah, that's unsurprising. Each new update creates a new version of
> its row. When you do them in separate transactions, then as soon as
> transaction N+1 commits the system can recognize that the row version
> created by transaction N is dead (no longer visible to anybody) and
> recycle it, allowing the number of row versions present on-disk to
> stay more or less constant. However, there's not equivalently good
> housekeeping for row versions created by a transaction that's still
> running. So when you do N updates in one transaction, there are going
> to be N doomed-but-not-yet-recyclable row versions on disk.
>
> Aside from the disk-space bloat, this is bad because the later updates
> have to scan through all the row versions created by earlier updates,
> looking for the version they're supposed to update. So you have an O(N^2)
> cost associated with that, which no doubt is what you're observing.
>
> There isn't any really good fix for this, other than "don't do that".
> David's nearby suggestion of using a temp table won't help, because
> this behavior is the same whether the table is temp or regular.
>
> In principle perhaps we could improve the granularity of dead-row
> detection, so that if a row version is both created and deleted by
> the current transaction, and we have no live snapshots that could
> see it, we could go ahead and mark the row dead. But it's not clear
> that that'd be worth the extra cost to do. Certainly no existing PG
> release tries to do it.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2020-02-14 17:54:29 Re: How to avoid UPDATE performance degradation in a transaction
Previous Message Tom Lane 2020-02-13 21:16:14 Re: How to avoid UPDATE performance degradation in a transaction