Re: Rapid disk usage spikes when updating large tables with GIN indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jonathan Marks <jonathanaverymarks(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Rapid disk usage spikes when updating large tables with GIN indexes
Date: 2018-05-14 19:08:15
Message-ID: 31807.1526324895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

[ please keep the list cc'd ]

Jonathan Marks <jonathanaverymarks(at)gmail(dot)com> writes:
> Thanks for your quick reply. Here’s a bit more information:
> 1) to measure the “size of the database” we run something like `select datname, pg_size_pretty(pg_database_size(datname)) from pg_database;` I’m not sure if this includes WAL size.
> 2) I’ve tried measuring WAL size with `select sum(size) from pg_ls_waldir();` — this also doesn’t budge.
> 3) Our current checkpoint_timeout is 600s with a checkpoint_completion_target of 0.9 — what does that suggest?

Hmph. Your WAL-size query seems on point, and that pretty much destroys
my idea about a WAL emission spike.

pg_database_size() should include all regular and temporary tables/indexes
in the named DB. It doesn't include WAL (but we've eliminated that), nor
cluster-wide tables such as pg_database (but those seem pretty unlikely
to be at issue), nor non-relation temporary files such as sort/hash temp
space. At this point I think we have to focus our attention on what might
be creating large temp files. I do not see anything in the GIN index code
that could do that, especially not if you have fastupdate off. I wonder
whether there is something about the particular bulk-insertion queries
you're using that could result in large temp files --- which'd make the
apparent correlation with GIN index use a mirage, but we're running out
of other ideas. You could try enabling log_temp_files to see if there's
anything to that.

In the grasping-at-straws department: are you quite sure that the extra
disk space consumption is PG's to begin with, rather than something
outside the database entirely?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan Marks 2018-05-14 19:31:42 Re: Rapid disk usage spikes when updating large tables with GIN indexes
Previous Message Tom Lane 2018-05-14 17:16:26 Re: Rapid disk usage spikes when updating large tables with GIN indexes