From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Freezing without write I/O |
Date: | 2013-05-30 18:39:46 |
Message-ID: | CA+TgmoZ6YEYfXQRi=YM5WWJ5raG9PKQpzDcim+3YJhFzyo3yrw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 30, 2013 at 9:33 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> The reason we have to freeze is that otherwise our 32-bit XIDs wrap around
> and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but
> that would waste a lot space. The trick is to add a field to the page header
> indicating the 'epoch' of the XID, while keeping the XIDs in tuple header
> 32-bit wide (*).
Check.
> The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we
> wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we
> could implicitly treat anything older than relfrozenxid as frozen.
Check.
> That's the basic idea. Vacuum freeze only needs to remove dead tuples, but
> doesn't need to dirty pages that contain no dead tuples.
Check.
> Since we're not storing 64-bit wide XIDs on every tuple, we'd still need to
> replace the XIDs with FrozenXid whenever the difference between the smallest
> and largest XID on a page exceeds 2^31. But that would only happen when
> you're updating the page, in which case the page is dirtied anyway, so it
> wouldn't cause any extra I/O.
It would cause some extra WAL activity, but it wouldn't dirty the page
an extra time.
> This would also be the first step in allowing the clog to grow larger than 2
> billion transactions, eliminating the need for anti-wraparound freezing
> altogether. You'd still want to truncate the clog eventually, but it would
> be nice to not be pressed against the wall with "run vacuum freeze now, or
> the system will shut down".
Interesting. That seems like a major advantage.
> (*) "Adding an epoch" is inaccurate, but I like to use that as my mental
> model. If you just add a 32-bit epoch field, then you cannot have xids from
> different epochs on the page, which would be a problem. In reality, you
> would store one 64-bit XID value in the page header, and use that as the
> "reference point" for all the 32-bit XIDs on the tuples. See existing
> convert_txid() function for how that works. Another method is to store the
> 32-bit xid values in tuple headers as offsets from the per-page 64-bit
> value, but then you'd always need to have the 64-bit value at hand when
> interpreting the XIDs, even if they're all recent.
As I see it, the main downsides of this approach are:
(1) It breaks binary compatibility (unless you do something to
provided for it, like put the epoch in the special space).
(2) It consumes 8 bytes per page. I think it would be possible to get
this down to say 5 bytes per page pretty easily; we'd simply decide
that the low-order 3 bytes of the reference XID must always be 0.
Possibly you could even do with 4 bytes, or 4 bytes plus some number
of extra bits.
(3) You still need to periodically scan the entire relation, or else
have a freeze map as Simon and Josh suggested.
The upsides of this approach as compared with what Andres and I are
proposing are:
(1) It provides a stepping stone towards allowing indefinite expansion
of CLOG, which is quite appealing as an alternative to a hard
shut-down.
(2) It doesn't place any particular requirements on PD_ALL_VISIBLE. I
don't personally find this much of a benefit as I want to keep
PD_ALL_VISIBLE, but I know Jeff and perhaps others disagree.
Random thought: Could you compute the reference XID based on the page
LSN? That would eliminate the storage overhead.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2013-05-30 18:46:50 | Re: Freezing without write I/O |
Previous Message | Josh Berkus | 2013-05-30 17:26:21 | Re: Freezing without write I/O |