From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: [PATCH 3/8] Add support for a generic wal reading facility dubbed XLogReader |
Date: | 2012-09-17 08:30:35 |
Message-ID: | 5056DFAB.3050707@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17.09.2012 11:12, Andres Freund wrote:
> On Monday, September 17, 2012 09:40:17 AM Heikki Linnakangas wrote:
>> On 15.09.2012 03:39, Andres Freund wrote:
>> 2. We should focus on reading WAL, I don't see the point of mixing WAL
> writing into this.
> If you write something that filters/analyzes and then forwards WAL and you want
> to do that without a big overhead (i.e. completely reassembling everything, and
> then deassembling it again for writeout) its hard to do that without
> integrating both sides.
It seems really complicated to filter/analyze WAL records without
reassembling them, anyway. The user of the facility is in charge of
reading the physical data, so you can still access the raw data, for
forwarding purposes, in addition to the reassembled records.
Or what exactly do you mean by "completely deassembling"? I read that to
mean dealing with page boundaries, ie. if a record is split across
pages, copy parts into a contiguous temporary buffer.
> Also, I want to read records incrementally/partially just as data comes in
> which again is hard to combine with writing out the data again.
You mean, you want to start reading the first half of a record, before
the 2nd half is available? That seems complicated. I'd suggest keeping
it simple for now, and optimize later if necessary. Note that before you
have the whole WAL record, you cannot CRC check it, so you don't know if
it's in fact a valid WAL record.
>> I came up with the attached. I moved ReadRecord and some supporting
>> functions from xlog.c to xlogreader.c, and made it operate on
>> XLogReaderState instead of global global variables. As discussed before,
>> I didn't like the callback-style API, I think the consumer of the API
>> should rather just call ReadRecord repeatedly to get each record. So
>> that's what I did.
> The problem with that is that kind of API is that it, at least as far as I can
> see, that it never can operate on incomplete/partial input. Your need to buffer
> larger amounts of xlog somewhere and you need to be aware of record boundaries.
> Both are things I dislike in a more generic user than xlog.c.
I don't understand that argument. A typical large WAL record is split
across 1-2 pages, maybe 3-4 at most, for an index page split record.
That doesn't feel like much to me. In extreme cases, a WAL record can be
much larger (e.g a commit record of a transaction with a huge number of
subtransactions), but that should be rare in practice.
The user of the facility doesn't need to be aware of record boundaries,
that's the responsibility of the facility. I thought that's exactly the
point of generalizing this thing, to make it unnecessary for the code
that uses it to be aware of such things.
> If you don't want the capability to forward/filter the data and read partial
> data without regard for record constraints/buffering your patch seems to be
> quite a good start. It misses xlogreader.h though...
Ah sorry, patch with xlogreader.h attached.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
xlogreader-heikki-2.patch | text/x-diff | 47.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2012-09-17 08:44:39 | Re: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries. |
Previous Message | Andres Freund | 2012-09-17 08:12:27 | Re: [PATCH 3/8] Add support for a generic wal reading facility dubbed XLogReader |