blending fast and temp space volumes

From: Rick Otten <rottenwindfish(at)gmail(dot)com>
To: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: blending fast and temp space volumes
Date: 2018-02-21 15:53:18
Message-ID: CAMAYy4JeVqeNFSRuYWT5PZVUdxoidQrfHXX7DHE6N-nmu44WQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Some of my data processes use large quantities of temp space - 5 or 6T
anyway.

We are running in Google Cloud. In order to get the best performance out
of all of my queries that might need temp space, I've configured temp space
on a concatenated local (volatile) SSD volume. In GCE, local SSD's are
more than 20x faster than SAN SSD's in GCE.

side note: The disadvantage of local SSD is that it won't survive "hitting
the virtual power button" on an instance, nor can it migrate automatically
to other hardware. (We have to hit the power button to add memory/cpu to
the system, and sometimes the power button might get hit by accident.)
This is OK for temp space. I never have my database come up automatically
on boot, and I have scripted the entire setup of the temp space volume and
data structures. I can run that script before starting the database.
I've done some tests and it seems to work great. I don't mind rolling
back any transaction that might be in play during a power failure.

So here is the problem: The largest local SSD configuration I can get in
GCE is 3T. Since I have processes that sometimes use more than that, I've
configured a second temp space volume on regular SAN SSD. My hope was
that if a query ran out of temp space on one volume it would spill over
onto the other volume. Unfortunately it doesn't appear to do that
automatically. When it hits the 3T limit on the one volume, the query
fails. :-(

So, the obvious solution is to anticipate which processes will need more
than 3T temp space and then 'set temp_tablespaces' to not use the 3T
volume. And that is what we'll try next.

Meanwhile, I'd like other processes to "prefer" the fast volume over the
slow one when the space is available. Ideally I'd like to always use the
fast volume and have the planner know about the different performance
characteristics and capacity of the available temp space volumes and then
choose the best one (speed or size) depending on the query's needs.

I was wondering if there anyone had ideas for how to make that possible.
I don't think I want to add the SAN disk to the same LVM volume group as
the local disk, but maybe that would work, since I'm already building it
with a script anyhow ... Is LVM smart enough to optimize radically
different disk performances?

At the moment it seems like when multiple temp spaces are available, the
temp spaces are chosen in a 'round robin' or perhaps 'random' fashion. Is
that true?

I'm meeting with my GCE account rep next week to see if there is any way to
get more than 3T of local SSD, but I'm skeptical it will be available any
time soon.

thoughts?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2018-02-21 16:04:00 Re: blending fast and temp space volumes
Previous Message Nandakumar M 2018-02-19 10:10:37 Re: Efficiently searching for the most recent rows where a column matches any result from a different query