Re: BUG #16604: pg_dump with --jobs breaks SSL connections

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zsolt Ero <zsolt(dot)ero(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: BUG #16604: pg_dump with --jobs breaks SSL connections
Date: 2020-10-17 22:58:23
Message-ID: 145080.1602975503@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> A bit of refactoring later, I have the attached, which seems to
> clean up all cases. I got rid of some bits of dead code along
> the way.

I pushed that at commit a45bc8a4f. However...

In some off-list discussion, Peter Eisentraut pointed out that
pg_dump/pg_restore are not the only programs with this problem.

Several of the src/bin/scripts/ programs, such as vacuumdb
and reindexdb, perform reconnections when told to use parallel
processing. These reconnections use methods similar to parallel
pg_restore and have the same bug. We can fix this straightforwardly
using much the same methods as in a45bc8a4f, as per 0001 below.
(0001 also fixes the documentation oversight that we didn't say that
these programs' --maintenance-db option could be a connstring.)

pg_dumpall, as well as "pg_dump -C" with text output, have a
variant of the issue: they emit psql scripts containing commands
such as "\connect databasename", and psql's \connect processing
isn't very good about re-using all connection options either.
In fact, it'll *only* re-use username, host name (and hostaddr if
given), and port. So if you do "psql -d '...some connection options'
and then try to run a dump script, the reconnections will fail
if there were other critical options in the -d connstring, much
like what was reported here. 0002 below is a draft attempt at
addressing this, but it's probably more controversial than 0001,
because there's a larger behavioral change involved.

An important thing to understand is that except for the \connect
case, all these programs perform a reconnection by re-using the
textual parameters given originally. Thus for example, if you do
"vacuumdb -j 4 -h host1,host2 ...", it's unclear which of host1 and
host2 will be connected to, either by the initial connection or any
of the parallel worker processes. I don't regard that as a bug
particularly. Perhaps some users would wish that once vacuumdb
made an initial connection, the workers would guarantee to reconnect
to that same host; but ISTM the real answer is "don't use that kind
of host spec for this purpose". (Note that there are other ways
to shoot yourself in the foot here, for example if your DNS can
resolve several different IPs for the same server host name. All
these cases seem more like pilot error than things for us to fix.)

\connect, however, is way out in left field. In the first place,
while these other programs are just interested in re-using all the
connection parameters except possibly the database name, \connect
intends to allow substitution of new values for some but not
necessarily all of the parameters. In the second place, \connect
makes some effort to force reconnection to occur to the very same
server if you don't specify a new host. Rather than re-using the
original parameter strings, it pulls out PQhost() and PQport()
values from the previous connection, and also PQhostaddr() if you
specified hostaddr previously. These values will reflect just the
server actually connected to, not the multi-host values you might
have given originally.

I'm inclined to think that this second behavior is simply wrong,
or at least a failure to adhere to the principle of least surprise.
An example of why it might be thought buggy is

postgres=# \connect "dbname=postgres host=sss1,sss1 hostaddr=127.0.0.1,::1"
You are now connected to database "postgres" as user "postgres".
postgres=# \connect -reuse-previous=on "dbname=postgres host=sss1,sss1"
could not match 2 host names to 1 hostaddr values
Previous connection kept

This happens because what -reuse-previous injects is not the previous
hostaddr connection parameter value ("127.0.0.1,::1"), but only the
actual connection address ("127.0.0.1"), resulting in a surprising
mismatch.

Moreover, psql isn't even being self-consistent by doing this.
If you get a connection loss, it will invoke PQreset() which just
re-uses the connection parameters previously given, and therefore
is perfectly content to connect to any of the hosts you named before.
It's quite unclear why \connect should be pickier than that case.

Therefore, my proposal in 0002 below is to drop all that logic.
The patch's method for re-using connection parameters is to fetch
the parameters of the old connection with PQconninfo(), which will
provide the textual values for each parameter slot, and then just
overwrite the specific slots that you provide new values for.
This seems to me to have considerably less surprise factor than
what happens now. But there's no denying that it changes behavior
in multi-host cases to a greater degree than is minimally necessary
to fix the bug complained of here.

There are some things adjacent to this that I'd kind of like to fix.
For example, I noted that createuser and dropuser lack any way to use
a connstring, as they have neither -d nor --maintenance-db options.
That seems pretty bogus, but it also seems like a missing feature not
a bug, so I didn't do anything about it here. (To be clear, I'm
proposing these changes for back-patch, as a45bc8a4f was.)

Comments?

regards, tom lane

Attachment Content-Type Size
0001-fix-reconnection-in-scripts-1.patch text/x-diff 37.8 KB
0002-fix-reconnection-in-psql-1.patch text/x-diff 14.8 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Zubkov 2020-10-18 11:01:22 Re: BUG #16659: postgresql leaks memory or do not limit its usage
Previous Message Tom Lane 2020-10-17 20:05:54 Re: date/time special values incorrectly cached as constant in plpgsql