Perplexing, regular decline in performance

From: Hugh Ranalli <hugh(at)whtc(dot)ca>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Perplexing, regular decline in performance
Date: 2019-06-25 15:49:03
Message-ID: CAAhbUMNdqFhdsoTKfE2wPp70+Zcrjs_Zcp7uh-6dnREZnVd5XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm hoping people can help me figure out where to look to solve an odd
PostgreSQL performance problem.

A bit of background: We have a client with a database of approximately 450
GB, that has a couple of tables storing large amounts of text, including
full HTML pages from the Internet. Last fall, they began experiencing
dramatic and exponentially decreasing performance. We track certain query
times, so we know how much time is being spent in calls to the database for
these functions. When this began, the times went from about an average of
approximate 200 ms to 400 ms, rapidly climbing each day before reaching 900
ms, figures we had never seen before, within 4 days, with no appreciable
change in usage. It was at this point that we restarted the database server
and times returned to the 400 ms range, but never back to their
long-running original levels. From this point onward, we had to restart the
database (originally the server, but eventually just the database process)
every 3-4 days, otherwise the application became unusable.

As they were still on PostgreSQL 8.2, we persuaded them to finally
undertake our long-standing recommendation to upgrade, as there was no
possibility of support on that platform. That upgrade to 11.2 was completed
successfully in mid-May, and although times have not returned to their
original levels (they now average approximately 250 ms), the application
overall seems much more responsive and faster (application servers were not
changed, other than minor changes --full text search, explicit casts,
etc.-- to conform to PostgreSQL 11's requirements).

What we continued to notice was a milder but still definite trend of
increased query times, during the course of each week, from the mid to high
200 ms, to the high 300 ms to low 400 ms. Some years ago, someone had
noticed that as the number of "raw_page" columns in a particular table
grew, performance would decline. They wrote a script that once a week locks
the table, deletes the processed large columns (they are not needed after
processing), copies the remaining data to a backup table, truncates the
original table, then copies it back. When this script runs we see an
immediate change in performance, from 380 ms in the hour before the drop,
to 250 ms in the hour of the drop. As rows with these populated columns are
added during the course of a week, the performance drops, steadily, until
the next week's cleaning operation. Each week the performance increase is
clear and significant.

What is perplexing is (and I have triple checked), that this table is *not*
referenced in any way in the queries that we time (it is referenced by
ongoing administrative and processing queries). The operation that cleans
it frees up approximately 15-20 GB of space each week. Our system
monitoring shows this change in free disk space, but this is 20 GB out of
approximately 300 GB of free space (free space is just under 40% of volume
size), so disk space does not seem to be an issue. The table in question is
about 21 GB in size, with about 20 GB in toast data, at its largest.

Even odder, the queries we time *do* reference a much larger table, which
contains very similar data, and multiple columns of it. It is 355 GB in
size, with 318 GB in toast data. It grows continually, with no cleaning.

If anyone has any suggestions as to what sort of statistics to look at, or
why this would be happening, they would be greatly appreciated.

Thanks in advance,
Hugh

--
Hugh Ranalli
Principal Consultant
White Horse Technology Consulting
e: hugh(at)whtc(dot)ca
c: +01-416-994-7957

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Benjamin Scherrey 2019-06-25 15:55:22 Re: Perplexing, regular decline in performance
Previous Message Rick Otten 2019-06-25 11:03:31 Re: materialized view refresh of a foreign table