From: | "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com> |
---|---|
To: | "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: [Patch] Optimize dropping of relation buffers using dlist |
Date: | 2020-10-15 03:34:09 |
Message-ID: | OSBPR01MB2341672E9A95E5EC6D2E79B5EF020@OSBPR01MB2341.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tuesday, October 13, 2020 10:09 AM, Tsunakawa-san wrote:
> Why do you do this way? I think the previous patch was more correct (while
> agreeing with Horiguchi-san in that nTotalBlocks may be unnecessary. What
> you want to do is "if the size of any fork could be inaccurate, do the traditional
> full buffer scan without performing any optimization for any fork," right? But
> the above code performs optimization for forks until it finds a fork with
> inaccurate size.
>
> (2)
> + * Get the total number of cached blocks and to-be-invalidated
> blocks
> + * of the relation. The cached value returned by smgrnblocks could
> be
> + * smaller than the actual number of existing buffers of the file.
>
> As you changed the meaning of the smgrnblocks() argument from cached to
> accurate, and you nolonger calculate the total blocks, the comment should
> reflect them.
>
>
> (3)
> In smgrnblocks(), accurate is not set to false when mdnblocks() is called.
> The caller doesn't initialize the value either, so it can see garbage value.
>
>
> (4)
> + if (nForkBlocks[i] != InvalidBlockNumber &&
> + nBlocksToInvalidate <
> BUF_DROP_FULL_SCAN_THRESHOLD)
> + {
> ...
> + }
> + else
> + break;
> + }
>
> In cases like this, it's better to reverse the if and else. Thus, you can reduce
> the nest depth.
Thank you for the review!
1. I have revised the patch addressing your comments/feedback. Attached are the latest set of patches.
2. Non-recovery Performance
I also included a debug version of the patch (0004) where I removed the recovery-related checks
to measure non-recovery performance.
However, I still can't seem to find the cause of why the non-recovery performance
does not change when compared to master. (1 min 15 s for the given test case below)
> - if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
> + if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
Here's how I measured it:
0. postgresql.conf setting
shared_buffers = 100GB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min
max_locks_per_transaction = 100
wal_log_hints = on
wal_keep_size = 100
max_wal_size = 20GB
1. createdb test
2. Create tables: SELECT create_tables(1000);
create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'create table tab_' || i::text || ' (a int);';
execute query_string;
end loop;
end;
$$ language plpgsql;
3 Insert rows to tables (3.5 GB db): SELECT insert_tables(1000);
create or replace function insert_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'insert into tab_' || i::text || ' SELECT generate_series(1, 100000);' ;
execute query_string;
end loop;
end;
$$ language plpgsql;
4. DELETE FROM tables: SELECT delfrom_tables(1000);
create or replace function delfrom_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'delete from tab_' || i::text;
execute query_string;
end loop;
end;
$$ language plpgsql;
5. Measure VACUUM timing
\timing
VACUUM;
Using the debug version of the patch, I have confirmed that it enters the optimization path
when it meets the conditions. Here are some printed logs from 018_wal_optimize_node_replica.log:
> make world -j4 -s && make -C src/test/recovery/ check PROVE_TESTS=t/018_wal_optimize.pl
WARNING: current fork 0, nForkBlocks[i] 1, accurate: 1
CONTEXT: WAL redo at 0/162B4E0 for Storage/TRUNCATE: base/13751/24577 to 0 blocks flags 7
WARNING: Optimization Loop.
buf_id = 41. nforks = 1. current fork = 0. forkNum: 0 == tag's forkNum: 0. curBlock: 0 < nForkBlocks[i] = 1. tag blockNum: 0 >= firstDelBlock[i]: 0. nBlocksToInvalidate = 1 < threshold = 32.
--
3. Recovery Performance (hot standby, failover)
OTOH, when executing recovery performance (using 0003 patch), the results were great.
| s_b | master | patched | %reg |
|-------|--------|---------|--------|
| 128MB | 3.043 | 2.977 | -2.22% |
| 1GB | 3.417 | 3.41 | -0.21% |
| 20GB | 20.597 | 2.409 | -755% |
| 100GB | 66.862 | 2.409 | -2676% |
To execute this on a hot standby setup (after inserting rows to tables)
1. [Standby] Pause WAL replay
SELECT pg_wal_replay_pause();
2. [Master] Measure VACUUM timing. Then stop server.
\timing
VACUUM;
\q
pg_ctl stop -mi -w
3. [Standby] Use the attached script to promote standby and measure the performance.
# test.sh recovery
So the current issue I'm still investigating is why the performance for non-recovery is bad,
while OTOH it's good when InRecovery.
Regards,
Kirk Jamison
Attachment | Content-Type | Size |
---|---|---|
v25-0001-Prevent-invalidating-blocks-in-smgrextend-during.patch | application/octet-stream | 1.1 KB |
v25-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks.patch | application/octet-stream | 8.7 KB |
v25-0003-Optimize-DropRelFileNodeBuffers-during-recovery.patch | application/octet-stream | 7.5 KB |
v25-0004-V25-with-ereport-for-debug.patch | application/octet-stream | 2.9 KB |
018_wal_optimize_node_replica.log | application/octet-stream | 138.5 KB |
test.sh | application/octet-stream | 405 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2020-10-15 03:38:23 | Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY |
Previous Message | Amit Kapila | 2020-10-15 03:32:17 | Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers |