From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: design for parallel backup |
Date: | 2020-04-21 05:31:49 |
Message-ID: | 20200421053149.cjqiwohw5ge6bwa4@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-04-21 10:20:01 +0530, Amit Kapila wrote:
> It is quite likely that compression can benefit more from parallelism
> as compared to the network I/O as that is mostly a CPU intensive
> operation but I am not sure if we can just ignore the benefit of
> utilizing the network bandwidth. In our case, after copying from the
> network we do write that data to disk, so during filesystem I/O the
> network can be used if there is some other parallel worker processing
> other parts of data.
Well, as I said, network and FS IO as done by server / pg_basebackup are
both fully buffered by the OS. Unless the OS throttles the userland
process, a large chunk of the work will be done by the kernel, in
separate kernel threads.
My workstation and my laptop can, in a single thread each, get close
20GBit/s of network IO (bidirectional 10GBit, I don't have faster - it's
a thunderbolt 10gbe card) and iperf3 is at 55% CPU while doing so. Just
connecting locally it's 45Gbit/s. Or over 8GBbyte/s of buffered
filesystem IO. And it doesn't even have that high per-core clock speed.
I just don't see this being the bottleneck for now.
> Also, there may be some users who don't want their data to be
> compressed due to some reason like the overhead of decompression is so
> high that restore takes more time and they are not comfortable with
> that as for them faster restore is much more critical then compressed
> or fast back up. So, for such things, the parallelism during backup
> as being discussed in this thread will still be helpful.
I am not even convinced it'll be helpful in a large fraction of
cases. The added overhead of more connections / processes isn't free.
I believe there are some cases where it'd help. E.g. if there are
multiple tablespaces on independent storage, parallelism as described
here could end up to a significantly better utilization of the different
tablespaces. But that'd require sorting work between processes
appropriately.
> OTOH, I think without some measurements it is difficult to say that we
> have significant benefit by paralysing the backup without compression.
> I have scanned the other thread [1] where the patch for parallel
> backup was discussed and didn't find any performance numbers, so
> probably having some performance data with that patch might give us a
> better understanding of introducing parallelism in the backup.
Agreed, we need some numbers.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Khandekar | 2020-04-21 05:37:21 | pgbench testing with contention scenarios |
Previous Message | Fujii Masao | 2020-04-21 05:27:20 | Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2 |