Re: On-the-fly index tuple deletion vs. hot_standby

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-the-fly index tuple deletion vs. hot_standby
Date: 2011-06-12 19:01:44
Message-ID: 20110612190144.GE21098@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 12, 2011 at 12:15:29AM -0400, Robert Haas wrote:
> On Sat, Jun 11, 2011 at 11:40 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > We currently achieve that wait-free by first marking the page with the next
> > available xid and then reusing it when that mark (btpo.xact) predates the
> > oldest running xid (RecentXmin). ?(At the moment, I'm failing to work out why
> > this is OK with scans from transactions that haven't allocated an xid, but I
> > vaguely recall convincing myself it was fine at one point.) ?It would indeed
> > also be enough to call GetLockConflicts(locktag-of-index, AccessExclusiveLock)
> > and check whether any of the returned transactions have PGPROC.xmin below the
> > mark. ?That's notably more expensive than just comparing RecentXmin, so I'm
> > not sure how well it would pay off overall. ?However, it could only help us on
> > the master. ?(Not strictly true, but any way I see to extend it to the standby
> > has critical flaws.) ?On the master, we can see a conflicting transaction and
> > put off reusing the page. ?By the time the record hits the standby, we have to
> > apply it, and we might have a running transaction that will hold a lock on the
> > index for the next, say, 72 hours. ?At such times, vacuum_defer_cleanup_age or
> > hot_standby_feedback ought to prevent the recovery stall.
> >
> > This did lead me to realize that what we do in this regard on the standby can
> > be considerably independent from what we do on the master. ?If fruitful, the
> > standby can prove the absence of a scan holding a right-link in a completely
> > different fashion. ?So, we *could* take the cleanup-lock approach on the
> > standby without changing very much on the master.
>
> Well, I'm generally in favor of trying to fix this problem without
> changing what the master does. It's a weakness of our replication
> technology that the standby has no better way to cope with a cleanup
> operation on the master than to start killing queries, but then again
> it's a weakness of our MVCC technology that we don't reuse space
> quickly enough and end up with bloat. I hear a lot more complaints
> about the second weakness than I do about the first.

I fully agree. That said, if this works on the standby, we may as well also use
it opportunistically on the master, to throttle bloat.

> At any rate, if taking a cleanup lock on the right-linked page on the
> standby is sufficient to fix the problem, that seems like a far
> superior solution in any case. Presumably the frequency of someone
> having a pin on that particular page will be far lower than any
> matching based on XID or heavyweight locks. And the vast majority of
> such pins should disappear before the startup process feels obliged to
> get out its big hammer.

Yep; looks promising.

Does such a thing have a chance of being backpatchable? I think the chances
start slim and fall almost to zero on account of the difficulty of avoiding a
WAL format change. Assuming that conclusion, I do think it's worth starting
with something simple, even if it means additional bloat on the master in the
wal_level=hot_standby + vacuum_defer_cleanup_age / hot_standby_feedback case.
In choosing those settings, the administrator has taken constructive steps to
accept master-side bloat in exchange for delaying recovery conflict. What's
your opinion?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-06-12 19:18:43 Make relation_openrv atomic wrt DDL
Previous Message Seref Arikan 2011-06-12 17:26:17 Detailed documentation for external calls (threading, shared resources etc)