RE: An I/O error occurred while sending to the backend (PG 13.4)

From: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: RE: An I/O error occurred while sending to the backend (PG 13.4)
Date: 2022-03-01 16:28:31
Message-ID: MN2PR15MB25602F46F8665F53E4E0692F85029@MN2PR15MB2560.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> -----Original Message-----
> From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
> Sent: Monday, February 28, 2022 17:05
> To: ldh(at)laurent-hasson(dot)com
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
>
> On Mon, Feb 28, 2022 at 09:43:09PM +0000, ldh(at)laurent-hasson(dot)com
> wrote:
> > On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
> > > > And the aforementioned network trace. You could set a capture
> filter on TCP
> > > > SYN|RST so it's not absurdly large. From my notes, it might look like
> this:
> > > > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)
> > >
> > > I'd also add '|| icmp'. My hunch is that you'll see some ICMP (not
> "ping")
> > > being sent by an intermediate gateway, resulting in the connection
> being
> > > reset.
> >
> > I am so sorry but I do not understand what you are asking me to do. I am
> unfamiliar with these commands. Is this a postgres configuration file? Is this
> something I just do once or something I leave on to hopefully catch it when
> the issue occurs? Is this something to do on the DB machine or the ETL
> machine? FYI:
>
> It's no problem.
>
> I suggest that you run wireshark with a capture filter to try to show *why*
> the connections are failing. I think the capture filter might look like:
>
> (icmp || (tcp[tcpflags] & (tcp-rst|tcp-syn|tcp-fin)!=0)) && host
> 10.64.17.211
>
> With the "host" filtering for the IP address of the *remote* machine.
>
> You could run that on whichever machine is more convenient and leave it
> running for however long it takes for that error to happen. You'll be able to
> save a .pcap file for inspection. I suppose it'll show either a TCP RST or an
> ICMP.
> Whichever side sent that is where the problem is. I still suspect the issue
> isn't in postgres.
>
> > - My ETL machine is on 10.64.17.211
> > - My DB machine is on 10.64.17.210
> > - Both on Windows Server 2012 R2, x64
>
> These network details make my theory unlikely.
>
> They're on the same subnet with no intermediate gateways, and
> communicate directly via a hub/switch/crossover cable. If that's true, then
> both will have each other's hardware address in ARP after pinging from one
> to the other.
>
> --
> Justin

Yes, the machines ARE on the same subnet. They actually even are on the same physical rack as per what I have been told. When I run a tracert, I get this:

Tracing route to PRODDB.xxx.int [10.64.17.210] over a maximum of 30 hops:
1 1 ms <1 ms <1 ms PRODDB.xxx.int [10.64.17.210]
Trace complete.

Now, there is an additional component I think... Storage is on an array and I am not getting a clear answer as to where it is 😊 Is it possible that something is happening at the storage layer? Could that be reported as a network issue vs a storage issue for Postgres?

Also, both machines are actually VMs. I forgot to mention that and not sure if that's relevant.

Thank you,
Laurent.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marc Millas 2022-03-01 16:28:45 Re: Simple task with partitioning which I can't realize
Previous Message David G. Johnston 2022-03-01 15:54:04 Re: Simple task with partitioning which I can't realize