From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Subject: | Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader |
Date: | 2012-06-14 21:38:33 |
Message-ID: | 201206142338.33897.andres@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
> On 13.06.2012 14:28, Andres Freund wrote:
> > Features:
> > - streaming reading/writing
> > - filtering
> > - reassembly of records
> >
> > Reusing the ReadRecord infrastructure in situations where the code that
> > wants to do so is not tightly integrated into xlog.c is rather hard and
> > would require changes to rather integral parts of the recovery code
> > which doesn't seem to be a good idea.
> It would be nice refactor ReadRecord and its subroutines out of xlog.c.
> That file has grown over the years to be really huge, and separating the
> code to read WAL sounds like it should be a pretty natural split. I
> don't want to duplicate all the WAL reading code, so we really should
> find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
> wrapper that just calls the new xlogreader code.
I aggree that it is not very nice to duplicate it. But I also don't want to go
the route of replacing ReadRecord with it for a while, we can replace
ReadRecord later if we want. As long as it is in flux like it is right now I
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c
workflow?
> > Missing:
> > - "compressing" the stream when removing uninteresting records
> > - writing out correct CRCs
> > - validating CRCs
> > - separating reader/writer
>
> - comments.
> At a quick glance, I couldn't figure out how this works. There seems to
> be some callback functions? If you want to read an xlog stream using
> this facility, what do you do?
You currently have to fill out 4 callbacks:
XLogReaderStateInterestingCB is_record_interesting;
XLogReaderStateWriteoutCB writeout_data;
XLogReaderStateFinishedRecordCB finished_record;
XLogReaderStateReadPageCB read_page;
As an example how to use it (from the walsender support for
START_LOGICAL_REPLICATION):
if(!xlogreader_state){
xlogreader_state = XLogReaderAllocate();
xlogreader_state->is_record_interesting =
RecordRelevantForLogicalReplication;
xlogreader_state->finished_record = ProcessRecord;
xlogreader_state->writeout_data = WriteoutData;
xlogreader_state->read_page = XLogReadPage;
/* startptr is the current XLog position */
xlogreader_state->startptr = startptr;
XLogReaderReset(xlogreader_state);
}
/* how far does valid data go */
xlogreader_state->endptr = endptr;
XLogReaderRead(xlogreader_state);
The last step will then call the above callbacks till it reaches endptr. I.e.
it first reads a page with "read_page"; then checks whether a record is
interesting for the use-case ("is_record_interesting"); in case it is
interesting, it gets reassembled and passed to the "finished_record" callback.
Then the bytestream gets written out again with "writeout_data".
In this case it gets written to the buffer the walsender has allocated. In
others it might just get thrown away.
> Can this be used for writing WAL, as well as reading? If so, what do you
> need the write support for?
It currently can replace records which are not interesting (e.g. index changes
in the case of logical rep). Filtered records are replaced with XLOG_NOOP
records with correct length currently. In future the actual amount of data
should really be reduced. I don't know yet know how to map LSNs of
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming
fashion).
The whole writing out part is pretty ugly at the moment and I just bolted it
ontop because it was convenient for the moment. I am not yet sure how the api
for that should look....
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2012-06-14 21:39:40 | Re: measuring spinning |
Previous Message | Heikki Linnakangas | 2012-06-14 21:34:05 | Re: WIP: relation metapages |