From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Direct I/O |
Date: | 2023-04-08 22:17:01 |
Message-ID: | 7bcffa12-12a1-7c7b-d68a-a9a39dba06ec@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2023-04-08 Sa 17:23, Andres Freund wrote:
> Hi,
>
> On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
>> Thomas Munro<thomas(dot)munro(at)gmail(dot)com> writes:
>> Now crake is doing this:
>>
>> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1
>> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384
>> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: select count(*) from t1
>> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384
>> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT: select count(*) from t1
>> 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: background worker "parallel worker" (PID 3257646) exited with exit code 1
>>
>> The fact that the error is happening in a parallel worker seems
>> interesting ...
> There were a few prior instances of that error. One that I hadn't seen before
> is this:
>
> [11:35:07.190](0.001s) # Failed test 'read back from shared'
> # at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43.
> [11:35:07.190](0.000s) # got: '10000'
> # expected: '10098'
>
> For one it points to the arguments to is() being switched around, but that's a
> sideshow.
>
>
> It's also odd that it's just crake having the issue. It's just a linux host,
> afaics. Andrew, is there any chance you can run that test in isolation and see
> whether it reproduces? If so, does the problem vanish, if you comment out the
> io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
> whether it's an independent issue exposed by the new test.
>
>
> I wonder if we should make the test use data checksum - if we continue to see
> the wrong query results, the corruption is more likely to be in memory.
>
I can run the test in isolation, and it's get an error reliably.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-04-08 22:32:25 | Re: longfin missing gssapi_ext.h |
Previous Message | Thomas Munro | 2023-04-08 22:10:36 | Re: Direct I/O |