Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

From: Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
Date: 2020-01-10 02:23:42
Message-ID: 20200110022342.GA92515@gate.oper.dinoex.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Jan 09, 2020 at 04:31:44PM -0500, Tom Lane wrote:
! Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org> writes:
! > flowmdev=> select * from flows;
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
!
! Oh ... that does look pretty broken. However, we've had no other similar
! reports, so there must be something unique to your configuration. Busted
! GSSAPI library, or some ABI inconsistency, perhaps? What platform are you
! on, and how did you build or obtain this Postgres code?

This is a FreeBSD 11.3-p3 r351611 built from source. Postgres is built
from
https://svn0.eu.freebsd.org/ports/branches/2019Q4 (rel. 12r1) or
https://svn0.eu.freebsd.org/ports/branches/2020Q1 (rel. 12.1)
with "make package install".
I have a build environment for base&ports that forces recompiles on
any change and should make ABI inconsistencies quite hard to create.

All local patches are versioned and documented; there are none that
I could imagine influencing this.
There are no patches on postgres. Also no patches on the GSSAPI.
There are a couple of patches on the Heimdal, to fix broken
commandline parsing, broken pidfile handling and broken daemonization.
None of them touches the core functionality (like key handling).

But I just recognize something of interest (which I had taken for
granted when importing the database): the flaw does NOT appear when
accessing the database from the server's local system (with TCP and
GSSAPI encryption active). Only from remote system.

But then, if I go on the local system, and change the mtu:
# ifconfig lo0 mtu 1500
and restart the server, then I get the exact same errors locally.

I don't get a clue of that, it doesn't make sense. With the default
lo0 mtu of 16384 the packets go on the network with the full 8256
bytes you send. With mtu 1500 they are split into 1448 byte pieces;
but TCP is supposed to handle this transparently. And what difference
would the encryption make with this?
> net.inet.tcp.sendspace: 32768
> net.inet.tcp.recvspace: 65536
These are also bigger. No, I don't understand that.

The only thing - these are all VIMAGE jails. VIMAGE was considered
'experimental' some time ago, and went productive in FreeBSD 12.0,
and 11.3 is lower and later than 12.0 - whatever that concedes.

Another thing I found out: the slower the network, the worse the
errors. So might it be nobody complained just because those people
usually having GSSAPI also have very fast machines and networks
nowadays?

When I go to packet-radio speed:
# ipfw pipe 4 config bw 10kbit/s

then I can see the query returning empty at the first received bytes:
flowmdev=# select * from flows;
flowmdev=#

and not even waiting the 8 seconds for the first block to arrive.

rgds,
PMc

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-01-10 03:24:12 Re: How can I pushdown of functions used in targetlist with FDW ?
Previous Message Tom Lane 2020-01-10 01:25:20 Re: How can I pushdown of functions used in targetlist with FDW ?

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2020-01-10 02:37:11 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Dilip Kumar 2020-01-10 02:00:34 Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX"