Re: Turning off HOT/Cleanup sometimes

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Turning off HOT/Cleanup sometimes
Date: 2014-01-09 18:20:14
Message-ID: CA+U5nMLrVoR_LjMNFw+Cj35DzP4dZXKOEfrGeAoBeTvavJykNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 January 2014 17:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>>> workloads, but it means that large SQL queries and pg_dump effectively
>>> do much the same work as VACUUM, generating huge amounts of I/O and
>>> WAL on the master, the cost and annoyance of which is experienced
>>> directly by the user. That is avoided on standbys.
>
>> On a pgbench workload, though, essentially all page cleanup happens as
>> a result of HOT cleanups, like >99.9%. It might be OK to have that
>> happen for write operations, but it would be a performance disaster if
>> updates didn't try to HOT-prune. Our usual argument for doing HOT
>> pruning even on SELECT cleanups is that not doing so pessimizes
>> repeated scans, but there are clearly cases that end up worse off as a
>> result of that decision.
>
> My recollection of the discussion when HOT was developed is that it works
> that way not because anyone thought it was beneficial, but simply because
> we didn't see an easy way to know when first fetching a page whether we're
> going to try to UPDATE some tuple on the page. (And we can't postpone the
> pruning, because the query will have tuple pointers into the page later.)
> Maybe we should work a little harder on passing that information down.
> It seems reasonable to me that SELECTs shouldn't be tasked with doing
> HOT pruning.
>
>> I'm not entirely wild about adding a parameter in this area because it
>> seems that we're increasingly choosing to further expose what arguably
>> ought to be internal implementation details.
>
> I'm -1 for a parameter as well, but I think that just stopping SELECTs
> from doing pruning at all might well be a win. It's at least worthy
> of some investigation.

Turning HOT off completely would be an absolute disaster for OLTP on
high update use cases against medium-large tables. That scenario is
well represented by pgbench and TPC-C. I am *not* suggesting we
recommend that and would look for very large caveats in the docs.
(That may not have been clear, I guess I just assumed people would
know I was heavily involved in the HOT project and understood its
benefits).

As stated, I am interested in turning off HOT in isolated, user
specified situations, perhaps just for isolated tables.

I'm not crazy about exposing magic parameters either but then I'm not
crazy about either automatic settings or deferring things because we
don't know how to set it. In general, I prefer the idea of having a
user settable parameter in one release then automating it in a later
release if clear settings emerge from usage. I'll submit a patch with
parameter, to allow experimentation, for possible removal at commit or
beta.

If I had to suggest a value for an internal parameter, I would say
that each SELECT statement should clean no more than 4 blocks. That
way current OLTP behaviour is mostly preserved while the big queries
and pg_dump don't suck in unpredictable ways.

I'll submit the patch and we can talk some more.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-09 18:21:15 Re: Turning off HOT/Cleanup sometimes
Previous Message Heikki Linnakangas 2014-01-09 18:18:36 Re: [BUG] Archive recovery failure on 9.3+.