Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Reid Thompson <Reid(dot)Thompson(at)omnicell(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode
Date: 2021-04-19 15:59:28
Message-ID: 20210419155928.GA3253@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2021-Apr-19, Reid Thompson wrote:

> Thanks - I found that, which seems to fix the error handling right? Or
> does it actually correct the cause of the segfault also?

Uh, what segfault? You didn't mention one. Yes, it fixes the error
handling, so when the system runs out of disk space, that's correctly
reported instead of continuing.

... Ah, I see now that you mentioned that the DB goes in recovery mode
in the subject line. That's exactly why I was looking at that problem
last year. What I saw is that the hash-join spill-to-disk phase runs
out of disk, so the disk file is corrupt; later the hash-join reads that
data back in memory, but because it is incomplete, it follows a broken
pointer somewhere and causes a crash.

(In our customer case it was actually a bit more complicated: they had
*two* sessions running the same large hash-join query, and one of them
filled up disk first, then the other also did that; some time later one
of them raised an ERROR freeing up disk space, which allowed the other
to continue until it tried to read hash-join data back and crashed).

So, yes, the fix will avoid the crash by the fact that once you run out
of disk space, the hash-join will be aborted and nothing will try to
read broken data.

You'll probably have to rewrite your query to avoid eating 2TB of disk
space.

--
Álvaro Herrera Valdivia, Chile

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Reid Thompson 2021-04-19 16:26:39 RE: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode
Previous Message Reid Thompson 2021-04-19 15:12:33 RE: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode