From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WAL format and API changes (9.5) |
Date: | 2014-04-08 14:35:22 |
Message-ID: | CAA4eK1Juw0ySkMaQOM55QH1K-DoRBo+OvCcPCv1V7grYETKULw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 3, 2014 at 7:44 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> I'd like to do some changes to the WAL format in 9.5. I want to annotate
> each WAL record with the blocks that they modify. Every WAL record already
> includes that information, but it's done in an ad hoc way, differently in
> every rmgr. The RelFileNode and block number are currently part of the WAL
> payload, and it's the REDO routine's responsibility to extract it. I want to
> include that information in a common format for every WAL record type.
>
> That makes life a lot easier for tools that are interested in knowing which
> blocks a WAL record modifies. One such tool is pg_rewind; it currently has
> to understand every WAL record the backend writes. There's also a tool out
> there called pg_readahead, which does prefetching of blocks accessed by WAL
> records, to speed up PITR. I don't think that tool has been actively
> maintained, but at least part of the reason for that is probably that it's a
> pain to maintain when it has to understand the details of every WAL record
> type.
>
> It'd also be nice for contrib/pg_xlogdump and backend code itself. The
> boilerplate code in all WAL redo routines, and writing WAL records, could be
> simplified.
I think it will also be useful, if we want to implement table/tablespace
PITR.
>
> That's for the normal cases. We'll need a couple of variants for also
> registering buffers that don't need full-page images, and perhaps also a
> function for registering a page that *always* needs a full-page image,
> regardless of the LSN. A few existing WAL record types just WAL-log the
> whole page, so those ad-hoc full-page images could be replaced with this.
>
> With these changes, a typical WAL insertion would look like this:
>
> /* register the buffer with the WAL record, with ID 0 */
> XLogRegisterBuffer(0, buf, true);
>
> rdata[0].data = (char *) &xlrec;
> rdata[0].len = sizeof(BlahRecord);
> rdata[0].buffer_id = -1; /* -1 means the data is always included */
> rdata[0].next = &(rdata[1]);
>
> rdata[1].data = (char *) mydata;
> rdata[1].len = mydatalen;
> rdata[1].buffer_id = 0; /* 0 here refers to the buffer registered
> above */
> rdata[1].next = NULL
>
> ...
> recptr = XLogInsert(RM_BLAH_ID, xlinfo, rdata);
>
> PageSetLSN(buf, recptr);
If we do register buffer's (that require or don't require FPI) before
calling XLogInsert(), then will there be any impact to handle case
where we come to know that we need to backup the buffer after
taking WALInsertLock.. or will that part of code remains same as it is
today.
> Redo
> ----
>
> There are four different states a block referenced by a typical WAL record
> can be in:
>
> 1. The old page does not exist at all (because the relation was truncated
> later)
> 2. The old page exists, but has an LSN higher than current WAL record, so it
> doesn't need replaying.
> 3. The LSN is < current WAL record, so it needs to be replayed.
> 4. The WAL record contains a full-page image, which needs to be restored.
I think there might be a need to have separate handling for some special
cases like Init Page which is used in few ops (Insert/Update/multi-insert).
Is there any criteria to decide if it needs to be a separate state or a special
handling for operations?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-04-08 14:50:35 | Re: Fwd: Proposal: variant of regclass |
Previous Message | Robert Haas | 2014-04-08 14:34:04 | Re: Fwd: Proposal: variant of regclass |