Re: [PATCH] Speedup truncates of relation forks

From: Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>
To: "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Speedup truncates of relation forks
Date: 2019-07-03 09:39:36
Message-ID: 2950c507-7487-55e4-574a-02b215a1abf2@anayrat.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/1/19 12:55 PM, Jamison, Kirk wrote:
> On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:
>> As far as I remember, you should see "relation" wait events (type lock) on
>> standby server. This is due to startup process acquiring AccessExclusiveLock
>> for the truncation and other backend waiting to acquire a lock to read the
>> table.
>
> Hi Adrien, thank you for taking time to reply.
>
> I understand that RelationTruncate() can block read-only queries on
> standby during redo. However, it's difficult for me to reproduce the
> test case where I need to catch that wait for relation lock, because
> one has to execute SELECT within the few milliseconds of redoing the
> truncation of one table.

Yes, that why your test by measuring vacuum execution time is better as it is
more reproductible.

>
> Instead, I just measured the whole recovery time, smgr_redo(),
> to show the recovery improvement compared to head. Please refer below.
>
> [Recovery Test]
> I used the same stored functions and configurations in the previous email
> & created "test" db.
>
> $ createdb test
> $ psql -d test
>
> 1. [Primary] Create 10,000 relations.
> test=# SELECT create_tables(10000);
>
> 2. [P] Insert one row in each table.
> test=# SELECT insert_tables(10000);
>
> 3. [P] Delete row of each table.
> test=# SELECT delfrom_tables(10000);
>
> 4. [Standby] WAL application is stopped at Standby server.
> test=# SELECT pg_wal_replay_pause();
>
> 5. [P] VACUUM is executed at Primary side, and measure its execution time.
> test=# \timing on
> test=# VACUUM;
>
> Alternatively, you may use:
> $ time psql -d test -c 'VACUUM;'
> (Note: WAL has not replayed on standby because it's been paused.)
>
> 6. [P] Wait until VACUUM has finished execution. Then, stop primary server.
> test=# pg_ctl stop -w
>
> 7. [S] Resume WAL replay, then promote standby (failover).
> I used a shell script to execute recovery & promote standby server
> because it's kinda difficult to measure recovery time. Please refer to the script below.
> - "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed.
> - "pg_ctl promote" to promote standby.
> - The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured.
>
> shell script:
>
> PGDT=/path_to_storage_directory/
>
> if [ "$1" = "resume" ]; then
> psql -c "SELECT pg_wal_replay_resume();" test
> date +%Y/%m/%d_%H:%M:%S.%3N
> pg_ctl promote -D ${PGDT}
> set +x
> date +%Y/%m/%d_%H:%M:%S.%3N
> while [ 1 ]
> do
> RS=`psql -Atc "select pg_is_in_recovery();" test`
> if [ ${RS} = "f" ]; then
> break
> fi
> done
> date +%Y/%m/%d_%H:%M:%S.%3N
> set -x
> exit 0
> fi
>
>
> [Test Results]
> shared_buffers = 24GB
>
> 1. HEAD
> (wal replay resumed)
> 2019/07/01_08:48:50.326
> server promoted
> 2019/07/01_08:49:50.482
> 2019/07/01_09:02:41.051
>
> Recovery Time:
> 13 min 50.725 s -> Time difference from WAL replay to complete recovery
> 12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>
> 2. PATCH
> (wal replay resumed)
> 2019/07/01_07:34:26.766
> server promoted
> 2019/07/01_07:34:57.790
> 2019/07/01_07:34:57.809
>
> Recovery Time:
> 31.043 s -> Time difference from WAL replay to complete recovery
> 00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>
> [Conclusion]
> The recovery time significantly improved compared to head
> from 13 minutes to 30 seconds.
>
> Any thoughts?
> I'd really appreciate your comments/feedback about the patch and/or test.
>
>

Thanks for the time you spend on this test, it is a huge win!
Although creating 10k tables and deleting tuples is not a common use case, it is
still good to know how your patch performs.
I will try to look deeper in your patch, but my knowledge on postgres internal
are limited :)

--
Adrien

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Prabhat Sahu 2019-07-03 09:40:28 Re: Attached partition not considering altered column properties of root partition.
Previous Message Nikolay Shaplov 2019-07-03 09:10:08 Re: [PATCH][PROPOSAL] Add enum releation option type