Re: Unkillable processes creating millions of tiny temp files

From: Jerry Sievers <gsievers19(at)comcast(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Finzel <finzelj(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unkillable processes creating millions of tiny temp files
Date: 2021-03-05 23:44:57
Message-ID: 875z25jhc6.fsf@jsievers.enova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Tom, thx for the quick response and a few remarks below...

I work at the same site that Jeremy does and we're both looking at this
today.

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Jeremy Finzel <finzelj(at)gmail(dot)com> writes:
>
>> We are running postgres 11.9 (were running 11.7 prior to recent restart) on
>> a large db (10s of TB) with 5 or 6 tablespaces and 1000s of tables/indexes.

Actually it's v11.11 now, but in any case, the issue was evident before
and after the minor version update.

>
>> Within the past few days we have started to see a few queries running for
>> over 8 hours which we then attempt to terminate, but will not terminate.
>> These queries are also generating hundreds of thousands of tiny/empty temp
>> files. In fact, before the restart there were over 23 million files in
>> pg_tmp which were removed. We also have verified no server settings have
>> changed for at least a couple weeks, well before this issue started
>> happening only in the past few days.
>
> Hm. For the query plan you show, I think the only plausible explanation
> for using temp files is that a hash join's hash table is exceeding
> work_mem so it's spilling batches of tuples to disk. With some bad
> luck those could be small not large. But I don't see anything in
> our commit logs between 11.7 and 11.9 that looks like it would have
> affected any of that behavior. (There were some changes to code
> governing temp-file tablespace selection, but that could only affect
> where the files get put not how big they are.) So I doubt that this
> can be blamed on the update, especially since if I read you correctly
> it didn't start happening immediately after the update.

It started happening a couple days ago for no obvious reason, so bad
stats are one of my thrtheories and w'ere running a defensive analyze
through the entire system to rule it out.

> I'm wondering about changes in table statistics possibly causing a
> poorly-chosen change in the hashing parameters. What have you got
> work_mem set to? Can you comment on whether the estimated rowcounts

work_mem 1G. We are not aware of the client overriding this in their
code, but it's possible.
> shown in the query plan are accurate? Does manually ANALYZE'ing the
> tables used in the query change anything?

These other points are still under investigation.

>
> The "unkillable" aspect is odd, but I wonder if that's just a red
> herring. A query that's generated lots of temp files will try to
> clean them up at termination, so maybe the backend is just sitting
> there removing temp files before it'll give control back.

I believe this is confirmed. I see that the backend after being
sig-term'd are now cycling through unlinks as seen by strace -p $pid.

None too quickly I might add and as mentioned earlier, the number of
these files is in the millions so it's hard to predict when cleanup will
finish.

As we did one night prior, a hard shutdown got the rogue jobs closed a
lot more quickly and we noticed that on the recovery restart, Pg took
about 15 minutes to clear >23M files from the temp area. I assume we'd
experience the same again if we need to do it.

I'm going to leave it a alone a a while longer before taking action.

Thanks again

> regards, tom lane

>
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-03-05 23:57:08 Re: Unkillable processes creating millions of tiny temp files
Previous Message Tom Lane 2021-03-05 22:28:40 Re: Unkillable processes creating millions of tiny temp files