From: | Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com> |
---|---|
To: | |
Cc: | Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP/PoC for parallel backup |
Date: | 2019-08-23 13:03:10 |
Message-ID: | CALtqXTcsT5aoxcKK1shs+r37LO6ToPJ8feztSH6w-R0zMuQD2g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen(at)pivotal(dot)io> wrote:
> Hi Asif
>
> Interesting proposal. Bulk of the work in a backup is transferring files
> from source data directory to destination. Your patch is breaking this
> task down in multiple sets of files and transferring each set in parallel.
> This seems correct, however, your patch is also creating a new process to
> handle each set. Is that necessary? I think we should try to achieve this
> using multiple asynchronous libpq connections from a single basebackup
> process. That is to use PQconnectStartParams() interface instead of
> PQconnectdbParams(), wich is currently used by basebackup. On the server
> side, it may still result in multiple backend processes per connection, and
> an attempt should be made to avoid that as well, but it seems complicated.
>
> What do you think?
>
> The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk).
If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway. I implemented the parallel backup
in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is
overkill.
There are two options, one is non-blocking calls or you can have some
worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that,
we
can see what is the best way to solve that. Some numbers may help to
understand the
actual benefit.
--
Ibrar Ahmed
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2019-08-23 13:21:59 | Re: "ago" times on buildfarm status page |
Previous Message | Pierre Giraud | 2019-08-23 12:47:56 | Explain: Duplicate key "Workers" in JSON format |