Re: Direct I/O

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Direct I/O
Date: 2023-04-08 21:15:34
Message-ID: CA+hUKGJ2JqN1O=kfdbZfVZKpTCkZXY4=nMwc1U4xe39YE66GTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 9, 2023 at 9:10 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384

> The fact that the error is happening in a parallel worker seems
> interesting ...

That's because it's running with debug_parallel_query=regress. I've
been trying to repro that but no luck... A different kind of failure
also showed up, where it counted the wrong number of tuples:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2023-04-08%2015%3A52%3A03

A paranoid explanation would be that this system is failing to provide
basic I/O coherency, we're writing pages out and not reading them back
in. Or of course there is a dumb bug... but why only here? Can of
course be timing-sensitive and it's interesting that crake suffers
from the "no unpinned buffers available" thing (which should now be
gone) with higher frequency; I'm keen to see if the dodgy-read problem
continues with a similar frequency now.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-04-08 21:23:37 Re: Direct I/O
Previous Message Tom Lane 2023-04-08 21:10:19 Re: Direct I/O