Re: design for parallel backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: design for parallel backup
Date: 2020-04-22 18:40:17
Message-ID: CA+Tgmobc9MqRvwOOZcd9cxX8fNuMN8eKDMmywsuyLeg8ri+Vjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 22, 2020 at 2:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I also can see a case for using N backends and one connection, but I
> think that'll be too complicated / too much bound by lcoking around the
> socket etc.

Agreed.

> Oh? I find it *extremely* exciting here. This is pretty close to the
> worst case compressability-wise, and zstd takes only ~22% of the time as
> gzip does, while still delivering better compression. A nearly 5x
> improvement in compression times seems pretty exciting to me.
>
> Or do you mean for zstd over lz4, rather than anything over gzip? 1.8x
> -> 2.3x is a pretty decent improvement still, no? And being able to do
> do it in 1/3 of the wall time seems pretty helpful.

I meant the latter thing, not the former. I'm taking it as given that
we don't want gzip as the only option. Yes, 1.8x -> 2.3x is decent,
but not as earth-shattering as 8.8x -> ~24x.

In any case, I lean towards adding both lz4 and zstd as options, so I
guess we're not really disagreeing here

> > Parallel zstd still compresses somewhat better than single-core lz4,
> > but the difference in compression ratio is far less, and the amount of
> > CPU you have to burn in order to get that extra compression is pretty
> > large.
>
> It's "just" a ~2x difference for "level 1" compression, right? For
> having 1.9GiB less to write / permanently store of a 16GiB base
> backup that doesn't seem that bad to me.

Sure, sure. I'm just saying some people may not be OK with ramping up
to 10 or more compression threads on their master server, if it's
already heavily loaded, and maybe only has 4 vCPUs or whatever, so we
should have lighter-weight options for those people. I'm not trying to
argue against zstd or against the idea of ramping up large numbers of
compression threads, just saying that lz4 looks awfully nice for
people who need some compression but are tight on CPU cycles.

> I agree we should pick one. I think tar is not a great choice. .zip
> seems like it'd be a significant improvement - but not necessarily
> optimal.

Other ideas?

> > I don't want to get so caught up in advanced features here that we
> > don't make any useful progress at all. If we can add better features
> > without a large complexity increment, and without drawing objections
> > from others on this list, great. If not, I'm prepared to summarily
> > jettison it as nice-to-have but not essential.
>
> Just to be clear: I am not at all advocating tying a change of the
> archive format to compression method / parallelism changes or anything.

Good, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-22 18:42:17 Re: More efficient RI checks - take 2
Previous Message Andres Freund 2020-04-22 18:36:00 Re: More efficient RI checks - take 2