From: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> |
---|---|
To: | Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP/PoC for parallel backup |
Date: | 2020-05-21 06:06:23 |
Message-ID: | CAGPqQf0Ehh-jxGRgYAk7j0oPRrW2Xk_d+h7f9yykznN2ewG=dQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com> wrote:
>
>
> On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>>
>>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
>>> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We at EnterpriseDB did some performance testing around this parallel
>>> backup to check how this is beneficial and below are the results. In this
>>> testing, we run the backup -
>>> > 1) Without Asif’s patch
>>> > 2) With Asif’s patch and combination of workers 1,2,4,8.
>>> >
>>> > We run those test on two setup
>>> >
>>> > 1) Client and Server both on the same machine (Local backups)
>>> >
>>> > 2) Client and server on a different machine (remote backups)
>>> >
>>> >
>>> > Machine details:
>>> >
>>> > 1: Server (on which local backups performed and used as server for
>>> remote backups)
>>> >
>>> > 2: Client (Used as a client for remote backups)
>>> >
>>> >
>>> ...
>>> >
>>> >
>>> > Client & Server on the same machine, the result shows around 50%
>>> improvement in parallel run with worker 4 and 8. We don’t see the huge
>>> performance improvement with more workers been added.
>>> >
>>> >
>>> > Whereas, when the client and server on a different machine, we don’t
>>> see any major benefit in performance. This testing result matches the
>>> testing results posted by David Zhang up thread.
>>> >
>>> >
>>> >
>>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
>>> usage and other information. What we noticed is that server is consuming
>>> the CPU almost 100% whole the time and pg_stat_activity shows that server
>>> is busy with ClientWrite most of the time.
>>> >
>>> >
>>>
>>> Was this for a setup where the client and server were on the same
>>> machine or where the client was on a different machine? If it was for
>>> the case where both are on the same machine, then ideally, we should
>>> see ClientRead events in a similar proportion?
>>>
>>
>> In the particular setup, the client and server were on different
>> machines.
>>
>>
>>> During an offlist discussion with Robert, he pointed out that current
>>> basebackup's code doesn't account for the wait event for the reading
>>> of files which can change what pg_stat_activity shows? Can you please
>>> apply his latest patch to improve basebackup.c's code [1] which will
>>> take care of that waitevent before getting the data again?
>>>
>>> [1] -
>>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>>
>>
>>
>> Sure, we can try out this and do a similar run to collect the
>> pg_stat_activity output.
>>
>
> Have you had the chance to try this out?
>
Yes. My colleague Suraj tried this and here are the pg_stat_activity output
files.
Captured wait events after every 3 seconds during the backup for -
1: parallel backup for 100GB data with 4 workers
(pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data
(pg_stat_activity_j4_100GB.txt)
Here is the observation:
The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)
BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)
ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups
ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachment | Content-Type | Size |
---|---|---|
pg_stat_activity_j4_100GB.txt | text/plain | 531.8 KB |
pg_stat_activity_normal_backup_100GB.txt | text/plain | 370.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-05-21 06:35:48 | Schedule of commit fests for PG14 |
Previous Message | Ahsan Hadi | 2020-05-21 05:17:29 | Re: WIP/PoC for parallel backup |