Re: Slowing UPDATEs inside a transaction

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Matt Burke <mattblists(at)icritical(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Slowing UPDATEs inside a transaction
Date: 2011-03-03 15:23:32
Message-ID: AANLkTi=PqSqh_EfdGCvauwn=GCunNG5-qbmPeXKYhQDT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 3, 2011 at 8:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 3, 2011 at 9:13 AM, Matt Burke <mattblists(at)icritical(dot)com> wrote:
>> Hi. I've only been using PostgreSQL properly for a week or so, so I
>> apologise if this has been covered numerous times, however Google is
>> producing nothing of use.
>>
>> I'm trying to import a large amount of legacy data (billions of
>> denormalised rows) into a pg database with a completely different schema,
>> de-duplicating bits of it on-the-fly while maintaining a reference count.
>> The procedures to do this have proven easy to write, however the speed is
>> not pretty. I've spent some time breaking down the cause and it's come down
>> to a simple UPDATE as evidenced below:
>
> PostgreSQL uses MVCC, which means that transactions see a snapshot of
> the database at existed at a certain point in time, usually the
> beginning of the currently query.  Old row versions have to be kept
> around until they're no longer of interest to any still-running
> transaction.  Sadly, our ability to detect which row versions are
> still of interest is imperfect, so we sometimes keep row versions that
> are technically not required.  Unfortunately, repeated updates by the
> same transaction to the same database row are one of the cases that we
> don't handle very well - all the old row versions will be kept until
> the transaction commits.  I suspect if you look at the problem case
> you'll find that the table and index are getting bigger with every set
> of updates, whereas when you do the updates in separate transactions
> the size grows for a while and then levels off.

Another perspective on this is that not having explicit transaction
control via stored procedures contributes to the problem. Being able
to manage transaction state would allow for a lot of workarounds for
this problem without forcing the processing into the client side.

To the OP I would suggest rethinking your processing as inserts into
one or more staging tables, followed up by a insert...select into the
final destination table. Try to use less looping and more sql if
possible...

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Burke 2011-03-04 09:21:54 Re: Slowing UPDATEs inside a transaction
Previous Message Andy Colson 2011-03-03 14:55:45 Re: Performance trouble finding records through related records