Re: Freeze avoidance of very large table.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-04-22 14:33:11
Message-ID: CA+TgmoZ5E3arm=-zbwhB9jT4UMDkFzMU67ydvG24wV7CRpp3PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 21, 2015 at 7:24 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 4/21/15 3:21 PM, Robert Haas wrote:
>> It's possible that we could use this infrastructure to freeze more
>> aggressively in other circumstances. For example, perhaps VACUUM
>> should freeze any page it intends to mark all-visible. That's not a
>> guaranteed win, because it might increase WAL volume: setting a page
>> all-visible does not emit an FPI for that page, but freezing any tuple
>> on it would, if the page hasn't otherwise been modified since the last
>> checkpoint. Even if that were no issue, the freezing itself must be
>> WAL-logged. But if we could somehow get to a place where all-visible
>> => frozen, then autovacuum would never need to visit all-visible
>> pages, a huge win.
>
> I don't know how bad the extra WAL traffic would be; we'd obviously need to
> incur it eventually, so it's a question of how common it is for a page to go
> all-visible but then go not-all-visible again before freezing. It would
> presumably be far more traffic than some form of a FrozenMap though...

Yeah, maybe. The freeze record contains details for each TID, while
the freeze map bit would only need to be set once for the whole page.
I wonder if the format of that record could be optimized somehow.

>> We could also attack the problem from the other end. Instead of
>> trying to set the bits on the individual tuples, we could decide that
>> whenever a page is marked all-visible, we regard it as frozen
>> regardless of the bits set or not set on the individual tuples.
>> Anybody who wants to modify the page must freeze any unfrozen tuples
>> "for real" before clearing the visibility map bit. This would have
>> the same end result as the previous idea: all-visible would
>> essentially imply frozen, and autovacuum could ignore those pages
>> categorically.
>
> Pushing what's currently background work onto foreground processes doesn't
> seem like a good idea...

When you phrase it that way, no, but pushing work that otherwise would
need to be done right now off to a future time that may never arrive
sounds like a good idea. Today, we freeze the page -- rewriting it --
and then keep scanning those all-frozen pages every X number of
transactions to make sure they are really all-frozen. In this system,
we'd eliminate the repeated scanning and defer the freeze work until
the page actually gets modified again. But that might never happen,
in which case we never have to do the work at all.

>> I'm not saying those ideas don't have problems, because they do. But
>> I think they are worth further exploring. The main reason I gave up
>> on that is because Heikki was working on the XID-to-LSN mapping stuff.
>> That seemed like a better approach than either of the above, so as
>> long as Heikki was working on that, there wasn't much reason to pursue
>> more lowbrow approaches. Clearly, though, we need to do something
>> about this. Freezing is a big problem for lots of users.
>
> Did XID-LSN die? I see at the bottom of the thread it was returned with
> feedback; I guess Heikki just hasn't had time and there's no major blockers?
> From what I remember this is probably a better solution, but if it's not
> going to make it into 9.6 then we should probably at least look further into
> a FM.

Heikki said he'd lost enthusiasm for it, but he wasn't too specific
about his reasons, IIRC. I guess maybe just that it got complicated,
and he wasn't sure it was correct.

>> All that having been said, I don't think adding a new fork is a good
>> approach. We already have problems pretty commonly where our
>> customers complain about running out of inodes. Adding another fork
>> for every table would exacerbate that problem considerably.
>
> Andres idea of adding this to the VM may work well to handle that. It would
> double the size of the VM, but it would still be a ratio of 32,000-1
> compared to heap size, or 2MB for a 64GB table.

Yes, that's got some potential. It would mean pg_upgrade would have
to remove all existing visibility maps when upgrading to the new
version, or rewrite them into the new format. But it otherwise seems
promising.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-04-22 14:37:06 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Stephen Frost 2015-04-22 14:16:42 Re: Row security violation error is misleading