Re: Mailing list subscription's mail delivery delays?

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-03 18:31:44
Message-ID: CABUevEx=Nswe8OLg7kdr4A3UNSTG_rTO4P8fs_VVpeJamzfp3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I have been seeing the same thing for a few days now, on my
> >> definitely-not-gmail personal server. Something's flaky in the
> >> PG mail infrastructure. It's gotten better since yesterday's
> >> outage, though I'm not convinced it's totally fixed.
>
> > There have been some pretty bad issues with gmail recently. Some
> > changes have been deployed that will hopefully help mitigate those and
> > make things better, but it takes time to recover.
>
> > The massive backlogs caused by gmail have been enough to spill over
> > and affect other destinations as well simply due to the load created
> > since we have such a huge number of gmail subscribers. But we're
> > slowly seeing the backlogs shrink now and the load come down so
> > hopefully the changes made will continue to have effect and let us be
> > back to normal soon.
>
> I'm still seeing multi-hour delivery delays on a subset of traffic,
> like maybe half a dozen instances today.
>
> Looking at the Received: timestamps shows pretty conclusively that
> the delays are within PG infra, for example this recent message from
> Heikki got hung up at two separate jumps:
>
> Return-Path: <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>
> Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
> by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
> (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
> for <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Mon, 2 Oct 2023 13:53:57 -0400
> Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
> by malur.postgresql.org with esmtp (Exim 4.94.2)
> (envelope-from <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>)
> id 1qnN7D-00GbGd-FB
> for tgl(at)sss(dot)pgh(dot)pa(dot)us; Mon, 02 Oct 2023 17:53:55 +0000
> Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
> by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> (Exim 4.94.2)
> (envelope-from <hlinnaka(at)iki(dot)fi>)
> id 1qnGcb-00AqOg-Ti
> for pgsql-hackers(at)lists(dot)postgresql(dot)org; Mon, 02 Oct 2023 10:57:53 +0000
> Received: from meesny.iki.fi ([195.140.195.201])
> by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> (Exim 4.94.2)
> (envelope-from <hlinnaka(at)iki(dot)fi>)
> id 1qnF5S-007kvc-AQ
> for pgsql-hackers(at)postgresql(dot)org; Mon, 02 Oct 2023 09:19:35 +0000
> Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
> (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
> key-exchange X25519 server-signature RSA-PSS (2048 bits))
> (No client certificate requested)
> (Authenticated sender: hlinnaka)
> by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
> Mon, 2 Oct 2023 12:19:29 +0300 (EEST)
> Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00(at)iki(dot)fi>
> Date: Mon, 2 Oct 2023 12:19:29 +0300
> ...
>
>
> Also, my own message <2154347(dot)1696278028(at)sss(dot)pgh(dot)pa(dot)us> went
> out to -hackers about 25 minutes ago and hasn't come back,
> so based on other recent examples I'm betting I won't see it
> for hours.
>
> Plenty of other traffic *is* coming through in normal-ish time,
> so I'm not sure I buy that there's still a massive logjam.

There is still definitely a problem, but it is slowly recovering. It
is *mostliy* hitting gmail at this point, but there can be spillover
to others in some cases (for example, there's a general throttling
when the load on the server gets too high). In this particular case,
it coincides timing-wise with our old friend the oom-killer nuking
postgres on the machine thereby stopping all incoming email for a
while before it got moving again. That particular problem should have
been taken care of completely by now, but the general backlog/queueing
problem is still ongoing but has been improving.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2023-10-03 18:38:34 Re: Permission to allow testing harness to send error reports for pgweb directly to mailing list.
Previous Message Akshat Jaimini 2023-10-03 12:22:27 Permission to allow testing harness to send error reports for pgweb directly to mailing list.