Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2022-04-26 06:11:20
Message-ID: CA+hUKGJE6K88=ifPd-2Lt8QJrnS4H+u-btoVCP+i=0Qv=9D2Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 26, 2022 at 6:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I believe that the WAL prefetch patch probably accounts for the
> intermittent errors that buildfarm member topminnow has shown
> since it went in, eg [1]:
>
> diff -U3 /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out

Hmm, maybe but I suspect not. I think I might see what's happening here.

> +ERROR: could not read WAL at 0/1903E40

> I've reproduced this manually on that machine, and confirmed that the
> proximate cause is that XLogNextRecord() is returning NULL because
> state->decode_queue_head == NULL, without bothering to provide an errormsg
> (which doesn't seem very well thought out in itself). I obtained the

Thanks for doing that. After several hours of trying I also managed
to reproduce it on that gcc23 system (not at all sure why it doesn't
show up elsewhere; MIPS 32 bit layout may be a factor), and added some
trace to get some more clues. Still looking into it, but here is the
current hypothesis I'm testing:

1. The reason there's a messageless ERROR in this case is because
there is new read_page callback logic introduced for pg_walinspect,
called via read_local_xlog_page_no_wait(), which is like the old
read_local_xlog_page() except that it returns -1 if you try to read
past the current "flushed" LSN, and we have no queued message. An
error is then reported by XLogReadRecord(), and appears to the user.

2. The reason pg_walinspect tries to read WAL data past the flushed
LSN is because its GetWALRecordsInfo() function keeps calling
XLogReadRecord() until EndRecPtr >= end_lsn, where end_lsn is taken
from a snapshot of the flushed LSN, but I don't see where it takes
into account that the flushed LSN might momentarily fall in the middle
of a record. In that case, xlogreader.c will try to read the next
page, which fails because it's past the flushed LSN (see point 1).

I will poke some more tomorrow to try to confirm this and try to come
up with a fix.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2022-04-26 06:26:59 Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Previous Message osumi.takamichi@fujitsu.com 2022-04-26 06:02:46 RE: Skipping schema changes in publication