Re: Query Using Massive Temp Space

From: Semler Miloslav <Miloslav(dot)Semler(at)crytur(dot)cz>
To: Cory Tucker <cory(dot)tucker(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query Using Massive Temp Space
Date: 2017-11-20 08:12:04
Message-ID: d43bbfdb9926422b9954abf8cc795483@crytur.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,
I have one question… Why you using so huge amount of grouping columns? Is there some reason for it? It is not definitelly fast method. I would prefer firstly do named query grouped by ids (account_id, candidate_id, parent_id) and then left join table candidates (to get rest of requested columns) out of this named query on this ids if it is possible.
You also avoid of hash indexing of these nine columns so you will save certain ammount of memory.

Miloslav

From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Cory Tucker
Sent: Monday, November 20, 2017 7:17 AM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Query Using Massive Temp Space

Hello,

I have a query that is using a tremendous amount of temp disk space given the overall size of the dataset. I'd love for someone to try to explain what PG is doing and why its using so much space for the query.

First off, the system is PG 9.6 on Ubuntu with 4 cores and 28 GB of RAM. The query in question is a fairly large join of several tables (6) including some aggregations. The overall dataset size of the 6 tables in question is about 20GB and the largest table is about 15M rows. The query is essentially a dump of most of the data from these tables joined together to be used in another system.

When the query runs it begins to use an aggressive amount of temp space on the volume over the course of many hours, until it reaches about 95% capacity and then tapers off. Never completes though. The temp space it uses is around 1.5TB out of a 2TB volume. Again, the total size of the relations in question is only 20GB.

Can anyone explain how the query could possibly use so much temp space?

Query and query plan are attached. Please let me know if any more info is needed.

thanks
--Cory

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ronen Nofar 2017-11-20 08:49:38 Registering Event Log on Windows
Previous Message Luca Ferrari 2017-11-20 08:05:03 Re: unnest-like pg_stats.most_common_values and pg_stats.most_common_freqs