| From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Zhang Mingli <zmlpostgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Subject: | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |
| Date: | 2022-09-12 02:27:58 |
| Message-ID: | 20220912022758.GD31833@telsasoft.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote:
> On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound
> > >
> > > I guess what happened here is that after one of your (apparently
> > > several?) OOM crashes, crash recovery didn't run all the way to the
> > > true end of the WAL due to the maintenance_io_concurrency=0 bug. In
> > > the case you reported, it couldn't complete an end-of-recovery
> > > checkpoint until you disabled recovery_prefetch, but that's only
> > > because of the somewhat unusual way that vismap pages work. In
> > > another case it might have been able to (bogusly) complete a
> > > checkpoint, leaving things in an inconsistent state.
> >
> > I think you're saying is that this can be explained by the
> > io_concurrency bug in recovery_prefetch, if run under 15b3.
>
> Well I don't know, but it's one way I could think of that you could
> have a data page referring to a multixact that isn't on disk after
> recovery (because the data page happens to have been flushed, but we
> didn't replay the WAL that would create the multixact).
>
> > But yesterday I started from initdb and restored this cluster from backup, and
> > started up sqlsmith, and sent some kill -9, and now got more corruption.
> > Looks like it took ~10 induced crashes before this happened.
>
> $SUBJECT says 15b4, which doesn't have the fix. Are you still using
> maintainance_io_concurrent=0?
Yeah ... I just realized that I've already forgotten the relevant
chronology.
The io_concurrency bugfix wasn't included in 15b4, so (if I understood
you correctly), that might explain these symptoms - right ?
--
Justin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2022-09-12 02:34:48 | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |
| Previous Message | Thomas Munro | 2022-09-12 02:25:48 | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |