Re: pg_upgrade --jobs

From: senor <frio_cervesa(at)hotmail(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_upgrade --jobs
Date: 2019-04-06 23:38:26
Message-ID: BYAPR01MB3701D38CD99FE2E23A2FE17FF7520@BYAPR01MB3701.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Tom. I suppose "pg_dump can only parallelize data dumping" answers my original question as "expected behavior" but I would like to understand the reason better.

My knowledge of Postgres and other DBMSs is at casual admin level with the occasional deep dive on specific errors or analysis. I'm not averse to getting into the code. Before my OP I searched for reasons that the schema-only option would prevent pg_dump from being able to run multiple jobs and didn't find anything that I understood to confirm either way.

Is the limitation simply the state of development to date or is there something about dumping the schemas that conflicts with paralleling? I'm willing to do some studying if provided links to relevant articles.

The --link option to pg_upgrade would be so much more useful if it weren't still bound to serially dumping the schemas of half a million tables. As already mentioned, if there is an alternate process that mimics pg_upgrade but allows for paralleling, I'm open to that.

Thanks all

________________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Saturday, April 6, 2019 3:02 PM
To: senor
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: pg_upgrade --jobs

senor <frio_cervesa(at)hotmail(dot)com> writes:
> Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory output format when calling pg_dump? Is the schema-only operation incompatible?

Well, there's no point in it. pg_dump can only parallelize data dumping,
and there's none to be done in the --schema-only case that pg_upgrade
uses.

Also, since pg_upgrade *does* use parallelism across multiple pg_dump
calls (if you've got multiple databases in the cluster), it'd be a bit
problematic to have another layer of parallelism below that, if it did
indeed do anything. You don't want "--jobs=10" to suddenly turn into
100 sessions.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-04-06 23:50:58 Re: pg_upgrade --jobs
Previous Message Tom Lane 2019-04-06 22:02:05 Re: pg_upgrade --jobs