From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce wait_for_subscription_sync for TAP tests |
Date: | 2022-09-30 02:11:49 |
Message-ID: | CA+hUKG+WKsZpdoryeqM8_rk5uQPCqS2HGY92WiMGFsK2wVkcig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 12, 2022 at 10:42 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Sat, Sep 10, 2022 at 10:00 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > On Sat, Sep 10, 2022 at 9:45 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
> > > > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > > > is relevant.
> > >
> > > Noting that the errors have only appeared in the past couple of
> > > days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> > > (Fix recovery_prefetch with low maintenance_io_concurrency).
> >
> > Yeah, I also just spotted the coincidence of those failures while
> > monitoring the build farm. I'll look into this later today. My
> > initial suspicion is that there was pre-existing code here that was
> > (incorrectly?) relying on the lack of error reporting in that case.
> > But maybe I misunderstood and it was incorrect to report the error for
> > some reason that was not robustly covered with tests.
>
> After I wrote that I saw Sawada-san's message and waited for more
> information, and I see there was now a commit. I noticed that
> peripatus was already logging the 'missing contrecord' error even when
> it didn't fail the test, and still does. I'm still looking into that
> (ie whether I need to take that new report_invalid_record() call out
> and replace it with errormsg_deferred = true so that XLogReadRecord()
> returns NULL with no error message in this case).
I will go ahead and remove this new error message added by adb46615.
The message was correlated with the problem on peripatus fixed by
88f48831, but not the cause of it -- but it's also not terribly
helpful and might be confusing. It might be reported: (1) in
pg_waldump when you hit the end of the segment with a missing
contrecord after the end, arguably rightfully, but then perhaps
someone might complain that they expect an error from pg_waldump only
on the final live segment at the end of the WAL, and (2) in a
walsender that is asked to shut down while between reads of pages with
a spanning contrecord, reported by logical_read_xlog_page() with a
messageless error (presumably peripatus's case?), (3) in crash
recovery with wal_recycle off (whereas normally you'd expect to see
the page_addr message from a recycled file), maybe more legitimately
than the above. The problem I needed to solve can be solved without
the message just by setting that flag as mentioned above, so I'll do
that to remove the new noise.
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2022-09-30 02:15:19 | Re: more descriptive message for process termination due to max_slot_wal_keep_size |
Previous Message | Andres Freund | 2022-09-30 02:09:33 | Re: longfin and tamandua aren't too happy but I'm not sure why |