Re: Mailing list subscription's mail delivery delays?

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-04 16:30:52
Message-ID: CABUevEwnM3H0pPs0aDQB36n6PzzqYi38LgSnJZRRugQ_CWxHkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Tue, Oct 3, 2023 at 2:31 PM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >> I have been seeing the same thing for a few days now, on my
> > >> definitely-not-gmail personal server. Something's flaky in the
> > >> PG mail infrastructure. It's gotten better since yesterday's
> > >> outage, though I'm not convinced it's totally fixed.
> >
> > > There have been some pretty bad issues with gmail recently. Some
> > > changes have been deployed that will hopefully help mitigate those and
> > > make things better, but it takes time to recover.
> >
> > > The massive backlogs caused by gmail have been enough to spill over
> > > and affect other destinations as well simply due to the load created
> > > since we have such a huge number of gmail subscribers. But we're
> > > slowly seeing the backlogs shrink now and the load come down so
> > > hopefully the changes made will continue to have effect and let us be
> > > back to normal soon.
> >
> > I'm still seeing multi-hour delivery delays on a subset of traffic,
> > like maybe half a dozen instances today.
> >
> > Looking at the Received: timestamps shows pretty conclusively that
> > the delays are within PG infra, for example this recent message from
> > Heikki got hung up at two separate jumps:
> >
> > Return-Path: <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>
> > Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
> > by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
> > (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
> > for <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Mon, 2 Oct 2023 13:53:57 -0400
> > Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
> > by malur.postgresql.org with esmtp (Exim 4.94.2)
> > (envelope-from <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>)
> > id 1qnN7D-00GbGd-FB
> > for tgl(at)sss(dot)pgh(dot)pa(dot)us; Mon, 02 Oct 2023 17:53:55 +0000
> > Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
> > by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> > (Exim 4.94.2)
> > (envelope-from <hlinnaka(at)iki(dot)fi>)
> > id 1qnGcb-00AqOg-Ti
> > for pgsql-hackers(at)lists(dot)postgresql(dot)org; Mon, 02 Oct 2023 10:57:53 +0000
> > Received: from meesny.iki.fi ([195.140.195.201])
> > by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> > (Exim 4.94.2)
> > (envelope-from <hlinnaka(at)iki(dot)fi>)
> > id 1qnF5S-007kvc-AQ
> > for pgsql-hackers(at)postgresql(dot)org; Mon, 02 Oct 2023 09:19:35 +0000
> > Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
> > (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
> > key-exchange X25519 server-signature RSA-PSS (2048 bits))
> > (No client certificate requested)
> > (Authenticated sender: hlinnaka)
> > by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
> > Mon, 2 Oct 2023 12:19:29 +0300 (EEST)
> > Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00(at)iki(dot)fi>
> > Date: Mon, 2 Oct 2023 12:19:29 +0300
> > ...
> >
> >
> > Also, my own message <2154347(dot)1696278028(at)sss(dot)pgh(dot)pa(dot)us> went
> > out to -hackers about 25 minutes ago and hasn't come back,
> > so based on other recent examples I'm betting I won't see it
> > for hours.
> >
> > Plenty of other traffic *is* coming through in normal-ish time,
> > so I'm not sure I buy that there's still a massive logjam.
>
> There is still definitely a problem, but it is slowly recovering. It
> is *mostliy* hitting gmail at this point, but there can be spillover
> to others in some cases (for example, there's a general throttling
> when the load on the server gets too high). In this particular case,
> it coincides timing-wise with our old friend the oom-killer nuking
> postgres on the machine thereby stopping all incoming email for a
> while before it got moving again. That particular problem should have
> been taken care of completely by now, but the general backlog/queueing
> problem is still ongoing but has been improving.

We *think* this issue has now been mostly resolved. We are still
seeing some extra delays in deliveries to gmail right now but that's
due to *us* slowing down the deliveries to not trigger things. But we
are now talking delays of minutes or tens of minutes, and not hours or
tens of hours. Non-gmail recipients should now be back to being mostly
unaffected.

We're continuing to monitor the situation of course, and to make
careful modifications to bring us back to the quicker deliverry times.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Daniel Gustafsson 2023-10-05 15:02:39 Re: Permission to allow testing harness to send error reports for pgweb directly to mailing list.
Previous Message Akshat Jaimini 2023-10-03 19:30:38 Re: Permission to allow testing harness to send error reports for pgweb directly to mailing list.