From: | Jerry Sievers <gsievers19(at)comcast(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeremy Finzel <finzelj(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Unkillable processes creating millions of tiny temp files |
Date: | 2021-03-05 23:44:57 |
Message-ID: | 875z25jhc6.fsf@jsievers.enova.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tom, thx for the quick response and a few remarks below...
I work at the same site that Jeremy does and we're both looking at this
today.
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Jeremy Finzel <finzelj(at)gmail(dot)com> writes:
>
>> We are running postgres 11.9 (were running 11.7 prior to recent restart) on
>> a large db (10s of TB) with 5 or 6 tablespaces and 1000s of tables/indexes.
Actually it's v11.11 now, but in any case, the issue was evident before
and after the minor version update.
>
>> Within the past few days we have started to see a few queries running for
>> over 8 hours which we then attempt to terminate, but will not terminate.
>> These queries are also generating hundreds of thousands of tiny/empty temp
>> files. In fact, before the restart there were over 23 million files in
>> pg_tmp which were removed. We also have verified no server settings have
>> changed for at least a couple weeks, well before this issue started
>> happening only in the past few days.
>
> Hm. For the query plan you show, I think the only plausible explanation
> for using temp files is that a hash join's hash table is exceeding
> work_mem so it's spilling batches of tuples to disk. With some bad
> luck those could be small not large. But I don't see anything in
> our commit logs between 11.7 and 11.9 that looks like it would have
> affected any of that behavior. (There were some changes to code
> governing temp-file tablespace selection, but that could only affect
> where the files get put not how big they are.) So I doubt that this
> can be blamed on the update, especially since if I read you correctly
> it didn't start happening immediately after the update.
It started happening a couple days ago for no obvious reason, so bad
stats are one of my thrtheories and w'ere running a defensive analyze
through the entire system to rule it out.
> I'm wondering about changes in table statistics possibly causing a
> poorly-chosen change in the hashing parameters. What have you got
> work_mem set to? Can you comment on whether the estimated rowcounts
work_mem 1G. We are not aware of the client overriding this in their
code, but it's possible.
> shown in the query plan are accurate? Does manually ANALYZE'ing the
> tables used in the query change anything?
These other points are still under investigation.
>
> The "unkillable" aspect is odd, but I wonder if that's just a red
> herring. A query that's generated lots of temp files will try to
> clean them up at termination, so maybe the backend is just sitting
> there removing temp files before it'll give control back.
I believe this is confirmed. I see that the backend after being
sig-term'd are now cycling through unlinks as seen by strace -p $pid.
None too quickly I might add and as mentioned earlier, the number of
these files is in the millions so it's hard to predict when cleanup will
finish.
As we did one night prior, a hard shutdown got the rogue jobs closed a
lot more quickly and we noticed that on the recovery restart, Pg took
about 15 minutes to clear >23M files from the temp area. I assume we'd
experience the same again if we need to do it.
I'm going to leave it a alone a a while longer before taking action.
Thanks again
> regards, tom lane
>
>
>
--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-03-05 23:57:08 | Re: Unkillable processes creating millions of tiny temp files |
Previous Message | Tom Lane | 2021-03-05 22:28:40 | Re: Unkillable processes creating millions of tiny temp files |