On Thu, Apr 14, 2022 at 07:45:15PM -0700, Noah Misch wrote:
> On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote:
> > On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> > > Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > > > the same set of symptoms (the same index-only scan create_index
> > > > regressions),
> > >
> > > Hm. I'm starting to get the feeling that the real problem here is
> > > we've "optimized" the system to the point where repeatable results
> > > from VACUUM are impossible :-(
> >
> > The synchronous_commit issue is an old one. It might actually be worth
> > addressing it by flushing out pending async commits out instead. It just
> > started to be noticeable when tenk1 load and vacuum were moved closer.
> >
> >
> > What do you think about applying a polished version of what I posted in
> > https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
> > ? That'd tell us a bit more about the horizon etc.
>
> No objection.
>
> > It's also interesting that it only happens in the installcheck cases,
> > afaics, not the check ones. Although that might just be because there's
> > more of them...
>
> I suspect the failure is somehow impossible in "check". Yesterday, I cranked
> up the number of locales, so there are now a lot more installcheck. Before
> that, each farm run had one "check" and two "installcheck". Those days saw
> ten installcheck failures, zero check failures.
>
> Like Tom, I'm failing to reproduce this outside the buildfarm client. I wrote
> a shell script to closely resemble the buildfarm installcheck sequence, but
> it's lasted a dozen runs without failing.
But 24s after that email, it did reproduce the problem. Same symptoms as the
last buildfarm runs, including visfrac=0. I'm attaching my script. (It has
various references to my home directory, so it's not self-contained.)