From: | Vahur Sinijärv <vahursi(at)icloud(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17705: Segmentation fault in BufFileLoadBuffer |
Date: | 2022-12-05 06:24:20 |
Message-ID: | B6CDA783-A78E-4FC4-A359-DE4316325D58@icloud.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi!
Where are these files normally created and what names do they have? We may still have them since last crash.
Vahur
> On 5. Dec 2022, at 05:41, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Sun, Dec 4, 2022 at 10:57 AM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
>> We started having random segmentation faults with our postgres 13.4 server,
>> running on RHEL 8.7. It was upgraded to 13.9, but the issue persists. The
>> database is fairly large, about 250GB on disk.
>>
>> I got core dump of one of the crashes and it shows SIGSEGV in
>> BufFileLoadBuffer. I tried to investigate this a little and it seems the
>> reason for it is that ExecHashJoinGetSavedTuple reads {0, 0} as header,
>> meaning hashvalue is 0 and tuple length is 0. Line 1277 in nodeHashjoin.c
>> subtracts sizeof(uint32) from 0 and passes it as size to BufFileRead(). GDB
>> shows size=18446744073627287632 at frame #1 which is not ((uint64_t) -4),
>> but -82263984. I think this is caused by BufFileRead which decrements
>> parameter 'size' by bytes read, so apparently it has read 82263980 bytes,
>> overwriting BufFile struct passed to BufFileLoadBuffer. Its files field now
>> contains ascii instead of pointer and file->files[file->curFile]; causes
>> SIGSEGV.
>>
>> Why it has read {0, 0} as saved tuple header, or what could have written
>> these zeroes there, I could not find out...
>
> Are you able to reproduce this on demand? Can you get your hands on
> the temporary file(s) it's reading? How large is it/are they?
> Perhaps we could write a little Python/whatever script to read the
> tuples back one at a time until it hits this {0, 0} header to confirm
> that it's definitely there, ie the bad header has actually been
> written out, which would help narrow down the location of the bug.
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2022-12-05 10:09:00 | Re: BUG #17705: Segmentation fault in BufFileLoadBuffer |
Previous Message | Thomas Munro | 2022-12-05 03:40:27 | Re: BUG #17705: Segmentation fault in BufFileLoadBuffer |