Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Reid Thompson <Reid(dot)Thompson(at)omnicell(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode
Date: 2021-04-19 13:53:51
Message-ID: 208065.1618840431@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Reid Thompson <Reid(dot)Thompson(at)omnicell(dot)com> writes:
> Hi I'm looking for some guidance related to the subject line issue.

Is this repeatable? If so you've found a bug, I think.

> 1. That the error message has been updated ( i.e. SUCCESS is not the proper value)

Yeah, what this really indicates is an incomplete read (file shorter than
expected). Since 11.8, we've improved the error reporting for that, but
that wouldn't in itself fix whatever the underlying problem is.

> 2. That the error is due to running out of temporary space either disk space or maybe temp_buffers?

That could be the proximate cause, although then there would be a bug
that the original write failure wasn't detected. But it seems about
as likely that there's just some inconsistency between what the temp
file writing code wrote and what the reading code expects to read.

Is this a parallelized hash join by any chance? That's new in v11
if memory serves, so it'd be interesting to see if disabling
enable_parallel_hash changes anything.

Anyway, I'd counsel updating to current (11.11), and then if you can
still reproduce the problem, try to reduce it to a self-contained
test case.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-04-19 13:57:24 Re: postgres in container, redirect csvlog to stderr
Previous Message Tom Lane 2021-04-19 13:40:27 Re: Question about contrib