From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | noah(at)leadboat(dot)com |
Cc: | robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz |
Subject: | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date: | 2019-11-28 11:56:20 |
Message-ID: | 20191128.205620.2015649987051831334.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Tue, 26 Nov 2019 21:37:52 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail> Is is not fully checked. I didn't merged and mesured performance yet,
> but I post the status-quo patch for now.
It was actually inconsistency caused by swap_relation_files.
1. rd_createSubid of relcache for r2 is not turned off. This prevents
the relcache entry from flushed. Commit processes pendingSyncs and
leaves the relcache entry with rd_createSubid != Invalid. It is
inconsistency.
2. relation_open(r1) returns a relcache entry with its relfilenode has
the old value (relfilenode1) since command counter has not been
incremented. On the other hand if it is incremented just before,
AssertPendingSyncConsistency() aborts because of the inconsistency
between relfilenode and rd_firstRel*.
As the result, I returned to think that we need to modify both
relcache entries with right relfilenode.
I once thought that taking AEL in the function has no side effect but
the code path is executed also when wal_level = replica or higher. And
as I mentioned upthread, we can even get there without taking any lock
on r1 or sometimes ShareLock. So upgrading to AEL emits Standby/LOCK
WAL and propagates to standby. After all I'd like to take the weakest
lock (AccessShareLock) there.
The attached is the new version of the patch.
- v26-0001-version-nm24.patch
Same with v24
- v26-0002-change-swap_relation_files.patch
Changes to swap_relation_files as mentioned above.
- v26-0003-Improve-the-performance-of-relation-syncs.patch
Do multiple pending syncs by one shared_buffers scanning.
- v26-0004-Revert-FlushRelationBuffersWithoutRelcache.patch
v26-0003 makes the function useless. Remove it.
- v26-0005-Fix-gistGetFakeLSN.patch
gistGetFakeLSN fix.
- v26-0006-Sync-files-shrinked-by-truncation.patch
Fix the problem of commit-time-FPI after truncation after checkpoint.
I'm not sure this is the right direction but pendingSyncHash is
removed from pendingDeletes list again.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v26-0001-version-nm24.patch | text/x-patch | 70.5 KB |
v26-0002-change-swap_relation_files.patch | text/x-patch | 2.4 KB |
v26-0003-Improve-the-performance-of-relation-syncs.patch | text/x-patch | 8.5 KB |
v26-0004-Revert-FlushRelationBuffersWithoutRelcache.patch | text/x-patch | 3.3 KB |
v26-0005-Fix-gistGetFakeLSN.patch | text/x-patch | 5.7 KB |
v26-0006-Sync-files-shrinked-by-truncation.patch | text/x-patch | 9.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2019-11-28 12:35:08 | Re: [HACKERS] WAL logging problem in 9.4.3? |
Previous Message | Jinbao Chen | 2019-11-28 11:18:57 | Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table. |