Rapid disk usage spikes when updating large tables with GIN indexes

From: Jonathan Marks <jonathanaverymarks(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Rapid disk usage spikes when updating large tables with GIN indexes
Date: 2018-05-14 16:28:05
Message-ID: F16EF890-B574-487B-973F-4D8C13366DAB@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello!

We have a mid-sized database on RDS running 10.1 (32 cores, 240 GB RAM, 5TB total disk space, 20k PIOPS) with several large (100GB+, tens of millions of rows) tables that use GIN indexes for full-text search. We at times need to index very large (hundreds of pages) documents and as a result our tables have a mix of small (tens of tokens) to very large (hundreds of thousands of tokens near to the tsvector 1MB limit). All our GIN indexes have fastupdate turned off — we found that turning fastupdate on led to significant blocking and that we get better average performance with it turned off. We’ve put a lot of effort into tuning our database over the last several years to the point where we have acceptable read and write performance for these tables.

One recurring, and predictable, issue that we have experienced regularly for multiple years is that inserting or updating rows in any table with GIN indexes results in extremely large drops in free disk space — i.e. inserting 10k rows with a total size of 10GB can result in the temporary loss of several hundred gigabytes of free disk space over 2-3 hours (and it could be more — we try to keep a 10-15% buffer of free disk space so that often represents almost all available disk space). Once we stop the operation, free disk space rapidly recovers, which makes us believe that this occurs due to logs, or some kind of temporary table. Our work_mem and maintenance_work_mem settings are pretty large (12GB and 62GB, respectively). The database’s size on disk scarcely budges during this process.

Unfortunately, we’re on RDS, so we’re unable to ssh directly into the instance to see what files are so large, and none of the logs we can see (nor the wal logs) are large enough to explain this process. Any suggestions about where to look to see the cause of this problem (or about any settings we can tune or changes we could make to stop it) would be greatly appreciated.

Thank you!

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Kretschmer 2018-05-14 17:01:18 Re: Recommended way to copy database files on Windows OS (to perform file system level backup)
Previous Message Christoph Moench-Tegeder 2018-05-14 14:14:27 Re: Recommended way to copy database files on Windows OS (to perform file system level backup)