From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: [PATCHES] Full page writes improvement, code update |
Date: | 2007-05-17 00:34:57 |
Message-ID: | 200705170034.l4H0YvM28308@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
---------------------------------------------------------------------------
Koichi Suzuki wrote:
> Hi,
>
> As replied to "Patch queue triage" by Tom, here's simplified patch to
> mark WAL record as "compressable", with no increase in WAL itself.
> Compression/decompression commands will be posted separately to PG
> Foundary for further review.
>
> -------------------------------
> As suggested by Tom, I agree that WAL should not include "both" full
> page write and incremental (logical) log. I began to examine WAL
> record format to see if incremental log can be made from full page
> writes. It will be okay even before 8.4, if simplified patch to the
> core is accepted. I will post simplified patch to the core as follows:
>
> 1. Mark the flag to indicate that the WAL record is compressable from
> full page writes to incremental log. This flag will be set if
> a) It is not written during the hot backup, and
> b) Archive command is active, and
> c) WAL contains full page writes, and
> d) full_page_writes=on.
> No logical log will be written to WAL in this case, and
> 2. During recovery, xl_tot_len check will be skipped for compressed WAL
> records.
>
> Please note that new GUC is not needed in this patch.
>
> With this patch, compress/decompress can be developped outside the core.
>
> I'd be very grateful if this patch can be considered again.
>
> Best Regards;
>
> --
> -------------
> Koichi Suzuki
> diff -cr pgsql_org/src/backend/access/transam/xlog.c pgsql/src/backend/access/transam/xlog.c
> *** pgsql_org/src/backend/access/transam/xlog.c 2007-05-02 15:56:38.000000000 +0900
> --- pgsql/src/backend/access/transam/xlog.c 2007-05-07 16:30:38.000000000 +0900
> ***************
> *** 837,842 ****
> --- 837,854 ----
> return RecPtr;
> }
>
> + /*
> + * If online backup is not in progress and WAL archiving is active, mark
> + * backup blocks removable if any.
> + * This mark will be referenced during archiving to remove needless backup
> + * blocks in the record and compress WAL segment files.
> + */
> + if (XLogArchivingActive() && fullPageWrites &&
> + (info & XLR_BKP_BLOCK_MASK) && !Insert->forcePageWrites)
> + {
> + info |= XLR_BKP_REMOVABLE;
> + }
> +
> /* Insert record header */
>
> record = (XLogRecord *) Insert->currpos;
> ***************
> *** 2738,2750 ****
> blk += blen;
> }
>
> ! /* Check that xl_tot_len agrees with our calculation */
> ! if (blk != (char *) record + record->xl_tot_len)
> {
> ! ereport(emode,
> ! (errmsg("incorrect total length in record at %X/%X",
> ! recptr.xlogid, recptr.xrecoff)));
> ! return false;
> }
>
> /* Finally include the record header */
> --- 2750,2778 ----
> blk += blen;
> }
>
> ! /*
> ! * If physical log has not been removed, check the length to see
> ! * the following.
> ! * - No physical log existed originally,
> ! * - WAL record was not removable because it is generated during
> ! * the online backup,
> ! * - Cannot be removed because the physical log spanned in
> ! * two segments.
> ! * The reason why we skip the length check on the physical log removal is
> ! * that the flag XLR_SET_BKB_BLOCK(0..2) is reset to zero and it prevents
> ! * the above loop to proceed blk to the end of the record.
> ! */
> ! if (!(record->xl_info & XLR_BKP_REMOVABLE) ||
> ! record->xl_info & XLR_BKP_BLOCK_MASK)
> {
> ! /* Check that xl_tot_len agrees with our calculation */
> ! if (blk != (char *) record + record->xl_tot_len)
> ! {
> ! ereport(emode,
> ! (errmsg("incorrect total length in record at %X/%X",
> ! recptr.xlogid, recptr.xrecoff)));
> ! return false;
> ! }
> }
>
> /* Finally include the record header */
> pgsql/src/backend/access/transam: xlog.c.orig
> diff -cr pgsql_org/src/include/access/xlog.h pgsql/src/include/access/xlog.h
> *** pgsql_org/src/include/access/xlog.h 2007-01-06 07:19:51.000000000 +0900
> --- pgsql/src/include/access/xlog.h 2007-05-07 16:30:38.000000000 +0900
> ***************
> *** 66,73 ****
> /*
> * If we backed up any disk blocks with the XLOG record, we use flag bits in
> * xl_info to signal it. We support backup of up to 3 disk blocks per XLOG
> ! * record. (Could support 4 if we cared to dedicate all the xl_info bits for
> ! * this purpose; currently bit 0 of xl_info is unused and available.)
> */
> #define XLR_BKP_BLOCK_MASK 0x0E /* all info bits used for bkp blocks */
> #define XLR_MAX_BKP_BLOCKS 3
> --- 66,74 ----
> /*
> * If we backed up any disk blocks with the XLOG record, we use flag bits in
> * xl_info to signal it. We support backup of up to 3 disk blocks per XLOG
> ! * record.
> ! * Bit 0 of xl_info is used to represent that backup blocks are not necessary
> ! * in archive-log.
> */
> #define XLR_BKP_BLOCK_MASK 0x0E /* all info bits used for bkp blocks */
> #define XLR_MAX_BKP_BLOCKS 3
> ***************
> *** 75,80 ****
> --- 76,82 ----
> #define XLR_BKP_BLOCK_1 XLR_SET_BKP_BLOCK(0) /* 0x08 */
> #define XLR_BKP_BLOCK_2 XLR_SET_BKP_BLOCK(1) /* 0x04 */
> #define XLR_BKP_BLOCK_3 XLR_SET_BKP_BLOCK(2) /* 0x02 */
> + #define XLR_BKP_REMOVABLE XLR_SET_BKP_BLOCK(3) /* 0x01 */
>
> /*
> * Sometimes we log records which are out of transaction control.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Marc G. Fournier | 2007-05-17 00:36:42 | Re: Lack of urgency in 8.3 reviewing |
Previous Message | Tom Lane | 2007-05-17 00:19:51 | Re: Testing concurrent psql |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-05-17 01:07:31 | Re: Implemented current_query |
Previous Message | Tom Lane | 2007-05-17 00:00:41 | Re: [DOCS] Autovacuum and XID wraparound |