From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | noah(at)leadboat(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz |
Subject: | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date: | 2019-05-23 07:10:35 |
Message-ID: | 20190523.161035.171704812.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Attached is a new version.
At Tue, 21 May 2019 21:29:48 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190521(dot)212948(dot)34357392(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Mon, 20 May 2019 15:54:30 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190520(dot)155430(dot)215084510(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > I suspect the design in the https://postgr.es/m/559FA0BA.3080808@iki.fi last
> > > paragraph will be simpler, not more complex. In the implementation I'm
> > > envisioning, smgrDoPendingDeletes() would change name, perhaps to
> > > AtEOXact_Storage(). For every relfilenode it does not delete, it would ensure
> > > durability by syncing (for large nodes) or by WAL-logging each page (for small
> > > nodes). RelationNeedsWAL() would return false whenever the applicable
> > > relfilenode appears in pendingDeletes. Access methods would remove their
> > > smgrimmedsync() calls, but they would otherwise not change. Would anyone like
> > > to try implementing that?
> >
> > Following this direction, the attached PoC works *at least for*
> > the wal_optimization TAP tests, but doing pending flush not in
> > smgr but in relcache. This is extending skip-wal feature to
> > indexes. And makes the old 0002 patch on nbtree useless.
>
> This is a tidier version of the patch.
>
> - Passes regression tests including 018_wal_optimize.pl
>
> - Move the substantial work to table/index AMs.
>
> Each AM can decide whether to support WAL skip or not.
> Currently heap and nbtree support it.
>
> - The timing of sync is moved from AtEOXact to PreCommit. This is
> because heap_sync() needs xact state = INPROGRESS.
>
> - matview and cluster is broken, since swapping to new
> relfilenode doesn't change rd_newRelfilenodeSubid. I'll address
> that.
cluster/matview are fixed.
A obstacle to fix them was the unreliability of
newRelfilenodeSubid. As mentioned in the comment of
RelationData, newRelfilenodeSubid may dissapear by certain
sequence of commands.
In the attched v14, I added "rd_firstRelfilenodeSubid", which
stores the subtransaction id where the first relfilenode
replacementin the current transaction. It suivives any sequence
of commands, including one mentioned in CopyFrom's comment (which
I removed by this patch).
With the attached patch, on relations based on table/index AMs
that supports WAL-skipping, WAL-logging is eliminated if the
relation is created in the current transaction, or relfilenode is
replaced in the current transaction. At-commit file sync is
surely performed. (Only Heap and Btree support it.)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v14-0001-TAP-test-for-copy-truncation-optimization.patch | text/x-patch | 10.7 KB |
v14-0002-Fix-WAL-skipping-feature.patch | text/x-patch | 37.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2019-05-23 08:47:48 | Re: Excessive memory usage in multi-statement queries w/ partitioning |
Previous Message | Daniel Gustafsson | 2019-05-23 07:09:52 | Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions? |