Quick Links

Re: Unlinking Parallel Hash Join inner batch files sooner

From:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unlinking Parallel Hash Join inner batch files sooner
Date:	2023-05-10 21:00:31
Message-ID:	20230510230031.4d710fd1@karst
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Thanks for working on this!

On Wed, 10 May 2023 15:11:20 +1200
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> One complaint about PHJ is that it can, in rare cases, use a
> surprising amount of temporary disk space where non-parallel HJ would
> not. When it decides that it needs to double the number of batches to
> try to fit each inner batch into memory, and then again and again
> depending on your level of bad luck, it leaves behind all the earlier
> generations of inner batch files to be cleaned up at the end of the
> query. That's stupid. Here's a patch to unlink them sooner, as a
> small improvement.

This patch can indeed save a decent amount of temporary disk space.

Considering its complexity is (currently?) quite low, it worth it.

> The reason I didn't do this earlier is that sharedtuplestore.c
> continues the pre-existing tradition where each parallel process
> counts what it writes against its own temp_file_limit. At the time I
> thought I'd need to have one process unlink all the files, but if a
> process were to unlink files that it didn't create, that accounting
> system would break. Without some new kind of shared temp_file_limit
> mechanism that doesn't currently exist, per-process counters could go
> negative, creating free money. In the attached patch, I realised
> something that I'd missed before: there is a safe point for each
> backend to unlink just the files that it created, and there is no way
> for a process that created files not to reach that point.

Indeed.

For what it worth, from my new and non-experienced understanding of the
parallel mechanism, waiting for all workers to reach
WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, after re-dispatching old batches in
new ones, seems like a safe place to instruct each workers to clean their old
temp files.

> Here's an example query that tries 8, 16 and then 32 batches on my
> machine, because reltuples is clobbered with a bogus value.

Nice!

Regards,

In response to

Unlinking Parallel Hash Join inner batch files sooner at 2023-05-10 03:11:20 from Thomas Munro

Responses

Re: Unlinking Parallel Hash Join inner batch files sooner at 2023-09-27 16:42:03 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Stark	2023-05-10 21:02:51	Re: base backup vs. concurrent truncation
Previous Message	Nikita Malakhov	2023-05-10 20:04:57	RFI: Extending the TOAST Pointer