Re: How can i be certain autovacuum is causing reuse if table still grows

From: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To: Sidney Aloisio Ferreira Pryor <sidney(at)tjrj(dot)jus(dot)br>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>, "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: How can i be certain autovacuum is causing reuse if table still grows
Date: 2020-07-28 13:55:39
Message-ID: CAODZiv4EqkNJ8rOAvth2mmb=06T5j=6Q1yNxLVcM-h8yAobV1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Again, I would caution against going with partitioning until you can first
try and tune the table itself. I have quite a bit of experience with it as
I'm the author of pg_partman so you can see I have nothing against
partitioning itself, but I'm only for using it when it's necessary. If you
decide to go that route, it would be a good tool to help you automate much
of the maintenance process of partitioning either by time or integer.

https://github.com/pgpartman/pg_partman

I say this because, while it could possibly help with making vacuum more
efficient, there are many caveats to partitioning. Since you're on 9.6,
there is no native partitioning, so you'd have to do trigger based
partitioning which has a HUGE impact on write performance (at least 2x
slower if not worse). Even with native partitioning in 10+, there is a
performance penalty during tuple routing and you would want to test that
impact. Especially if you are doing updates which would move data across
child partitions (updates that move data prior to PG10 are very, very
complicated and not supported by pg_partman). Also, if you have any unique
keys to maintain, they are not enforced across the partition set, and only
supported in native if they are also part of the partition key itself.
You're also just adding to the general complexity of your database overall.

I would in general say the primary reason you would want to consider
partitioning in PG is if you're trying to expire old data out. Dropping a
table is much more efficient than running large/many DELETE statements due
to the very issues you're encountering now: bloat. But it sounds like you
are not expiring data, so you also have to take into account how many
partitions you will have long term. It wasn't until PG12 that having over
just even a few hundred partitions could have major performance impact
overall. So, before even considering partitioning, you would want to get
onto the latest major release.

Partitioning can help with vacuum and general maintenance, but that's
really only when the actual live data in a single table starts getting
very, very large. And in that case you may want to first consider your
hardware resources before going down the road of partitioning since that
may be the better benefit long term. You've got nearly 2.3TB of space to
clean up in this table, so, again, you want to solve that problem first. If
you had that much free space before this vacuum finished, you likely have
even more now after it finishes. And just note that a regular vacuum is
likely not going to clean up this space. It may clean up some, but vacuum
only returns space to the system in very specific circumstances (the
tail-end pages are empty). You could use something like pg_repack or
pg_squeeze to do this without a long outage, but with bloat this size, you
may run into issues with how long those tools would need to run. Especially
if the table is still in use. So, again, I would recommend an outage to do
a VACUUM FULL which will lock and rewrite the table. I'd be curious to see
what your total size of the entire table before and after is. The following
function will give that total size (table + indexes)

select pg_total_relation_size('public.mytable');

So, while partitioning could possibly be the answer long term, I would not
recommend it until you've cleaned up the existing bloat on this table and
its indexes and try to tune autovacuum run more efficiently. If that works,
you've saved yourself a great deal of complexity. And again, you'd
definitely want to get on the latest major version of PG where partitioning
has improved tremendously, so if a major version upgrade isn't in the works
for a while, I wouldn't even consider it.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

On Tue, Jul 28, 2020 at 6:18 AM Sidney Aloisio Ferreira Pryor <
sidney(at)tjrj(dot)jus(dot)br> wrote:

> Thank you Ron
> I would appreciate if you could share a good documentation or review link
> about partitioning tables on postgres.
>
> Enviado do meu iPhone
>
> Em 28 de jul de 2020, à(s) 02:16, Ron <ronljohnsonjr(at)gmail(dot)com> escreveu:
>
> 
> > All lines even old ones might be updated by our application.
>
> Ah. Even so, there are benefits to maintaining "lots of smaller chunks"
> instead of One Giant Table: "pg_dump --jobs=X" will run faster, for
> example, when there are more small-to-mid-sized tables instead of A Few
> Giant Tables. Ditto "reindexdb --jobs=X".
>
> Given that even "old" records get updated, I'd look into some other key
> that you can "naturally" partition on.
>
> On 7/27/20 10:54 PM, Sidney Aloisio Ferreira Pryor wrote:
>
> Thank you Ron.
> Yes there is a column id in which value is provided by a sequence.
>
> But it was not clear to me exactly how to partition?
> The idea is to cover a limited range of "id" so there will be less lines
> to vacuum on each operation?
>
> Because yet we will need eventually to vacuum all others.
> All lines even old ones might be updated by our application.
>
> ------------------------------
> *De:* Ron <ronljohnsonjr(at)gmail(dot)com> <ronljohnsonjr(at)gmail(dot)com>
> *Enviado:* terça-feira, 28 de julho de 2020 00:39
> *Para:* pgsql-admin(at)lists(dot)postgresql(dot)org
> <pgsql-admin(at)lists(dot)postgresql(dot)org> <pgsql-admin(at)lists(dot)postgresql(dot)org>
> *Assunto:* Re: How can i be certain autovacuum is causing reuse if table
> still grows
>
> On 7/27/20 5:19 PM, Keith Fiske wrote:
> [snip]
> > Also, I would not recommend partitioning simply to improve vacuuming.
> > Especially if extensive tuning hasn't been tried first. Most times you
> can
> > get per-table tuning working well enough to get autovacuum running
> > properly. Especially on 9.6 and even more-so on PG11, where autovacuum
> has
> > itself been improved.
>
> SIMPLY to improve vacuum performance? No. But there are reasons that
> partitioning was "invented", and minimizing the work needed to be done on
> the whole of a Very Large and Rapidly Growing table is one of them.
>
> --
> Angular momentum makes the world go 'round.
>
>
>
> --
> Angular momentum makes the world go 'round.
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Sidney Aloisio Ferreira Pryor 2020-07-28 18:20:06 RE: How can i be certain autovacuum is causing reuse if table still grows
Previous Message soumik.bhattacharjee 2020-07-28 12:49:39 RE: Cache Hit Ratio%