From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | sfrost(at)snowman(dot)net |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, zeleny(dot)ales(at)gmail(dot)com, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208 |
Date: | 2019-11-14 05:28:03 |
Message-ID: | 20191114.142803.1155800695686449695.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello.
At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> Greetings,
>
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > > * Aleš Zelený (zeleny(dot)ales(at)gmail(dot)com) wrote:
> > >> But recovery on replica failed to proceed WAL file
> > >> 0000000100000FED00000039 with log message: " invalid contrecord length
> > >> 1956 at FED/38FFE208".
> >
> > > Err- you've drawn the wrong conclusion from that message (and you're
> > > certainly not alone- it's a terrible message and we should really have a
> > > HINT there or something).
> >
> > Yeah, those messages are all pretty ancient, from when WAL was new and not
> > to be trusted much. Perhaps the thing to do is move the existing info
> > into DETAIL and make the primary message be something like "reached
> > apparent end of WAL stream".
>
> Yes, +1 on that.
The attached is something like that. AFAICS we can assume that an
invalid record means the end of WAL as long as we are fetching
successive records and XLogPageRead() has not logged anything.
As the current comment in ReadRecord says, we don't have a message if
standby has been triggered, but that's not always the case. Still we
may have a message if new segment is available after triggering.
I used a kind-of-tricky way to handle optional errdetail but it may be
better to have some additional feature as an ereport
subfunction. (Maybe named "errif" or something?)
I think it works fine as expected but I find one arguable behavior.
In the first patch, the "end of WAL" message is accompanied by source,
LSN and timeline.
LOG: reached end of WAL in archive at 0/A3000060 on timeline 8
DETAIL: invalid record length at 0/A3000060: wanted 24, got 0
The last two just seems good but the first one shows "archive" even on
promotion. It is right that we scan archive after promotion but seems
confusing. So I attached another patch that shows operation instead of
WAL source.
LOG: reached end of WAL during streaming reaplication at 0/A5000060 on timeline 8
DETAIL: invalid record length at 0/A5000060: wanted 24, got 0
What do you think about this?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
log_end_of_wal.patch | text/x-patch | 3.6 KB |
log_end_of_wal_2.patch | text/x-patch | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Abhijit Gharami | 2019-11-14 05:38:18 | root page 3 of index "pg_class_oid_index" has level 0, expected 1 |
Previous Message | Gerrit Fouche | 2019-11-14 05:18:18 | Re: terminated by signal 11: Segmentation fault |