From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: Access method extendability |
Date: | 2016-03-29 16:45:25 |
Message-ID: | 56FAB125.4020401@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 29/03/16 18:25, Alvaro Herrera wrote:
>> + /*-------------------------------------------------------------------------
>> >+ * API for construction of generic xlog records
>> >+ *
>> >+ * This API allows user to construct generic xlog records which describe
>> >+ * difference between pages in a generic way. This is useful for
>> >+ * extensions which provide custom access methods because they can't
>> >+ * register their own WAL redo routines.
>> >+ *
>> >+ * Each record must be constructed by following these steps:
>> >+ * 1) GenericXLogStart(relation) - start construction of a generic xlog
>> >+ * record for the given relation.
>> >+ * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
>> >+ * for the record. This function returns a copy of the page
>> >+ * image where modifications can be performed. The second argument
>> >+ * indicates if the block is new (i.e. a full page image should be taken).
>> >+ * 3) Apply modification of page images obtained in the previous step.
>> >+ * 4) GenericXLogFinish() - finish construction of generic xlog record.
>> >+ *
>> >+ * The xlog record construction can be canceled at any step by calling
>> >+ * GenericXLogAbort(). All changes made to page images copies will be
>> >+ * discarded.
>> >+ *
>> >+ * Please, note the following points when constructing generic xlog records.
>> >+ * - No direct modifications of page images are allowed! All modifications
>> >+ * must be done in the copies returned by GenericXLogRegister(). In other
>> >+ * words the code which makes generic xlog records must never call
>> >+ * BufferGetPage().
>> >+ * - Registrations of buffers (step 2) and modifications of page images
>> >+ * (step 3) can be mixed in any sequence. The only restriction is that
>> >+ * you can only modify page image after registration of corresponding
>> >+ * buffer.
>> >+ * - After registration, the buffer also can be unregistered by calling
>> >+ * GenericXLogUnregister(buffer). In this case the changes made in
>> >+ * that particular page image copy will be discarded.
>> >+ * - Generic xlog assumes that pages are using standard layout, i.e., all
>> >+ * data between pd_lower and pd_upper will be discarded.
>> >+ * - Maximum number of buffers simultaneously registered for a generic xlog
>> >+ * record is MAX_GENERIC_XLOG_PAGES. An error will be thrown if this limit
>> >+ * is exceeded.
>> >+ * - Since you modify copies of page images, GenericXLogStart() doesn't
>> >+ * start a critical section. Thus, you can do memory allocation, error
>> >+ * throwing etc between GenericXLogStart() and GenericXLogFinish().
>> >+ * The actual critical section is present inside GenericXLogFinish().
>> >+ * - GenericXLogFinish() takes care of marking buffers dirty and setting their
>> >+ * LSNs. You don't need to do this explicitly.
>> >+ * - For unlogged relations, everything works the same except there is no
>> >+ * WAL record produced. Thus, you typically don't need to do any explicit
>> >+ * checks for unlogged relations.
>> >+ * - If registered buffer isn't new, generic xlog record contains delta
>> >+ * between old and new page images. This delta is produced by per byte
>> >+ * comparison. This current delta mechanism is not effective for data shifts
>> >+ * inside the page and may be improved in the future.
>> >+ * - Generic xlog redo function will acquire exclusive locks on buffers
>> >+ * in the same order they were registered. After redo of all changes,
>> >+ * the locks will be released in the same order.
>> >+ *
>> >+ *
>> >+ * Internally, delta between pages consists of set of fragments. Each
>> >+ * fragment represents changes made in given region of page. A fragment is
>> >+ * described as follows:
>> >+ *
>> >+ * - offset of page region (OffsetNumber)
>> >+ * - length of page region (OffsetNumber)
>> >+ * - data - the data to place into described region ('length' number of bytes)
>> >+ *
>> >+ * Unchanged regions of page are not represented in the delta. As a result,
>> >+ * the delta can be more compact than full page image. But if the unchanged region
>> >+ * of the page is less than fragment header (offset and length) the delta
>> >+ * would be bigger than the full page image. For this reason we break into fragments
>> >+ * only if the unchanged region is bigger than MATCH_THRESHOLD.
>> >+ *
>> >+ * The worst case for delta size is when we didn't find any unchanged region
>> >+ * in the page. Then size of delta would be size of page plus size of fragment
>> >+ * header.
>> >+ */
>> >+ #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber))
>> >+ #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE
>> >+ #define MAX_DELTA_SIZE BLCKSZ + FRAGMENT_HEADER_SIZE
>
I incorporated your changes and did some additional refinements on top
of them still.
Attached is delta against v12, that should cause less issues when
merging for Teodor.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
generic-xlog-12-delta.patch | text/plain | 7.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2016-03-29 16:48:42 | Re: [PROPOSAL] Client Log Output Filtering |
Previous Message | Robert Haas | 2016-03-29 16:45:15 | Re: [PROPOSAL] Client Log Output Filtering |