Re: Direct I/O

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Direct I/O
Date: 2023-04-08 22:17:01
Message-ID: 7bcffa12-12a1-7c7b-d68a-a9a39dba06ec@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2023-04-08 Sa 17:23, Andres Freund wrote:
> Hi,
>
> On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
>> Thomas Munro<thomas(dot)munro(at)gmail(dot)com> writes:
>> Now crake is doing this:
>>
>> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1
>> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384
>> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: select count(*) from t1
>> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384
>> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT: select count(*) from t1
>> 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: background worker "parallel worker" (PID 3257646) exited with exit code 1
>>
>> The fact that the error is happening in a parallel worker seems
>> interesting ...
> There were a few prior instances of that error. One that I hadn't seen before
> is this:
>
> [11:35:07.190](0.001s) # Failed test 'read back from shared'
> # at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43.
> [11:35:07.190](0.000s) # got: '10000'
> # expected: '10098'
>
> For one it points to the arguments to is() being switched around, but that's a
> sideshow.
>
>
> It's also odd that it's just crake having the issue. It's just a linux host,
> afaics. Andrew, is there any chance you can run that test in isolation and see
> whether it reproduces? If so, does the problem vanish, if you comment out the
> io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
> whether it's an independent issue exposed by the new test.
>
>
> I wonder if we should make the test use data checksum - if we continue to see
> the wrong query results, the corruption is more likely to be in memory.
>

I can run the test in isolation, and it's get an error reliably.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-04-08 22:32:25 Re: longfin missing gssapi_ext.h
Previous Message Thomas Munro 2023-04-08 22:10:36 Re: Direct I/O