Quick Links

Re: Very slow update / hash join

From:	Kurt Roeckx <kurt(at)roeckx(dot)be>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Very slow update / hash join
Date:	2016-05-06 18:21:42
Message-ID:	20160506182141.GA469@roeckx.be
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, May 06, 2016 at 10:25:34AM -0700, Jeff Janes wrote:
>
> OK, so it sounds like what is happening is that your update cannot do
> a "Heap-Only Tuple" (HOT) update, because there is not enough room in
> each data page for the new copy of rows being updated. So it is
> forced to put the new copy on a different page, which means it has to
> update all the indexes so they know where to find the new version.

That makes total sense now.

> If this not a one-time event, then one thing you could do is lower the
> table's fillfactor, so that the table is more loosely packed and
> future updates are more likely to be able to do HOT updates. If the
> rows being updated are randomly scattered, it wouldn' take much
> lowering to make this happen (maybe 90). But if the rows being
> updated in a single transaction are co-located with each other, then
> you might have to lower it to below 50 before it would solve the
> problem, which might be a solution worse than the problem. When you
> change the parameter, it won't take full effect until the table has
> been completely rewritten, either due to natural churn, or running a
> VACUUM FULL or CLUSTER.

I will probably want to run this a few times. The data being
updated comes from an external tool and once I add new things or
fix bug in it I would like to update the old rows. It's normally an
insert/select only table.

But there are only about 20M of the 133M current rows (about 15%)
that I'm really interested in. So I guess something like an 85%
fillfactor might actually help.

> If this giant update does not have to occur atomically in order for
> your application to behave correctly, then I would probably break it
> up into a series of smaller transactions. Then you could even run
> them in parallel, which would be a big help if you have a RAID (which
> can efficiently work on multiple random IO read requests in parallel)
> but not help so much if you have a single disk.

I don't care about it being atomic or not. I actually tried to do
it in smaller batches before and I ended up calculating that it
would take 2 weeks to do the update.

Kurt

In response to

Re: Very slow update / hash join at 2016-05-06 17:25:34 from Jeff Janes

Browse pgsql-general by date

	From	Date	Subject
Next Message	Guillaume Lelarge	2016-05-07 00:02:32	Re: Allow disabling folding of unquoted identifiers to lowercase
Previous Message	Jeff Janes	2016-05-06 17:25:34	Re: Very slow update / hash join