From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-the-fly index tuple deletion vs. hot_standby |
Date: | 2011-06-12 19:01:44 |
Message-ID: | 20110612190144.GE21098@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 12, 2011 at 12:15:29AM -0400, Robert Haas wrote:
> On Sat, Jun 11, 2011 at 11:40 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > We currently achieve that wait-free by first marking the page with the next
> > available xid and then reusing it when that mark (btpo.xact) predates the
> > oldest running xid (RecentXmin). ?(At the moment, I'm failing to work out why
> > this is OK with scans from transactions that haven't allocated an xid, but I
> > vaguely recall convincing myself it was fine at one point.) ?It would indeed
> > also be enough to call GetLockConflicts(locktag-of-index, AccessExclusiveLock)
> > and check whether any of the returned transactions have PGPROC.xmin below the
> > mark. ?That's notably more expensive than just comparing RecentXmin, so I'm
> > not sure how well it would pay off overall. ?However, it could only help us on
> > the master. ?(Not strictly true, but any way I see to extend it to the standby
> > has critical flaws.) ?On the master, we can see a conflicting transaction and
> > put off reusing the page. ?By the time the record hits the standby, we have to
> > apply it, and we might have a running transaction that will hold a lock on the
> > index for the next, say, 72 hours. ?At such times, vacuum_defer_cleanup_age or
> > hot_standby_feedback ought to prevent the recovery stall.
> >
> > This did lead me to realize that what we do in this regard on the standby can
> > be considerably independent from what we do on the master. ?If fruitful, the
> > standby can prove the absence of a scan holding a right-link in a completely
> > different fashion. ?So, we *could* take the cleanup-lock approach on the
> > standby without changing very much on the master.
>
> Well, I'm generally in favor of trying to fix this problem without
> changing what the master does. It's a weakness of our replication
> technology that the standby has no better way to cope with a cleanup
> operation on the master than to start killing queries, but then again
> it's a weakness of our MVCC technology that we don't reuse space
> quickly enough and end up with bloat. I hear a lot more complaints
> about the second weakness than I do about the first.
I fully agree. That said, if this works on the standby, we may as well also use
it opportunistically on the master, to throttle bloat.
> At any rate, if taking a cleanup lock on the right-linked page on the
> standby is sufficient to fix the problem, that seems like a far
> superior solution in any case. Presumably the frequency of someone
> having a pin on that particular page will be far lower than any
> matching based on XID or heavyweight locks. And the vast majority of
> such pins should disappear before the startup process feels obliged to
> get out its big hammer.
Yep; looks promising.
Does such a thing have a chance of being backpatchable? I think the chances
start slim and fall almost to zero on account of the difficulty of avoiding a
WAL format change. Assuming that conclusion, I do think it's worth starting
with something simple, even if it means additional bloat on the master in the
wal_level=hot_standby + vacuum_defer_cleanup_age / hot_standby_feedback case.
In choosing those settings, the administrator has taken constructive steps to
accept master-side bloat in exchange for delaying recovery conflict. What's
your opinion?
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2011-06-12 19:18:43 | Make relation_openrv atomic wrt DDL |
Previous Message | Seref Arikan | 2011-06-12 17:26:17 | Detailed documentation for external calls (threading, shared resources etc) |