Re: How can i be certain autovacuum is causing reuse if table still grows

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Sidney Aloisio Ferreira Pryor <sidney(at)tjrj(dot)jus(dot)br>
Cc: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>, Ron <ronljohnsonjr(at)gmail(dot)com>, "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: How can i be certain autovacuum is causing reuse if table still grows
Date: 2020-08-04 09:03:44
Message-ID: CAECtzeUwzjaX2KqT4oFiodFpCw7PSsVSj_21OuYvz+uU_8M29A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Le lun. 3 août 2020 à 23:36, Sidney Aloisio Ferreira Pryor <
sidney(at)tjrj(dot)jus(dot)br> a écrit :

> Hi Keith, how are you?
> Hope you are very fine.
>
> As i promissed we ran vacuum full on that table on a test environment
> similar to production and it took 11 hours to complete.
> It reduced the original table segment from 2.3TB to 100GB as below (note
> that it is still growing...).
> So we are now proposing to our Direction a maintenance window to vacuum
> full production.
>
> Original:
> table_schema | table_name | total_size | data_size |
> external_size
>
> --------------+---------------------+------------+-----------+---------------
> public | flip_pagina_edicao | 4578 GB | 2880 GB | 1699 GB
>
> Copy of Orginal (after vacuum full):
> table_schema | table_name | total_size | data_size |
> external_size
>
> --------------+---------------------+------------+-----------+---------------
> public | flip_pagina_edicao | 180 GB | 100 GB | 80 GB
>
>
> In meantime if you could please help us understand some curious facts
> below about vacuum verbose (not full).
> I am running on production another vacuum verbose (not full) because that
> first one stucked on the same indexes for 3 days.
> But now is on a 3rd sequential scan of all indexes of that table followed
> by a scan on the table itself.
> It is not finishing even completing scans.
>
> 1) Why it is scanning again and not finishing vacuum?
>
>
Your maintenance_work_mem parameter value is probably too low.

2) I noticed that on all scans it is taking 12 hours only on an index on a
> bpchar(1) column (character(1) DEFAULT 'N'::bpchar).
> All other indexes and even the table it self are running too much faster,
> can you guess a reason for this?
>
> CREATE INDEX ocrindex
> ON public.flip_pagina_edicao
> USING btree
> (ocr COLLATE pg_catalog."default");
>
>
Not used, so not in cache? or much bigger than the other ones?

3) And is repeatly scanning 11.184.520 or 521 or 528 row versions.
> Why does it not end and scans again the same amount of row versions?
>
>
Well, VACUUM is a multi-steps operation. It first scans the table and puts
every tid that needs to be deleted in memory. Once done, il will scan
indexes. But if there's more TID than memory can hold, it will have to do
multiple scans of the table and the indexes. It seems clear in your case
that, when VACUUM finds 11184520 TID, memory is full. By the way, memory is
limited by a parameter named maintenance_work_mem. So what is its current
value?

flip=# vacuum verbose flip_pagina_edicao;
> INFO: vacuuming "public.flip_pagina_edicao"
> INFO: scanned index "id_caderno" to remove 11184520 row versions
> DETAIL: CPU 155.34s/540.40u sec elapsed 1101.49 sec
> INFO: scanned index "ordem_index" to remove 11184520 row versions
> DETAIL: CPU 253.35s/870.77u sec elapsed 1962.36 sec
> INFO: scanned index "primary00024" to remove 11184520 row versions
> DETAIL: CPU 134.10s/478.79u sec elapsed 922.02 sec
> INFO: scanned index "ordem" to remove 11184520 row versions
> DETAIL: CPU 265.16s/932.90u sec elapsed 1878.20 sec
> INFO: scanned index "flip_pagina_edicao_pkey" to remove 11184520 row
> versions
> DETAIL: CPU 149.51s/523.42u sec elapsed 949.68 sec
> INFO: scanned index "flip_flippagedic_idcaderno" to remove 11184520 row
> versions
> DETAIL: CPU 460.19s/1171.75u sec elapsed 2696.89 sec
> INFO: scanned index "nomepdfindex" to remove 11184520 row versions
> DETAIL: CPU 598.88s/1286.84u sec elapsed 3543.26 sec
> INFO: scanned index "ocrindex" to remove 11184520 row versions
> DETAIL: CPU 25657.30s/18907.82u sec elapsed 46652.49 sec
> INFO: scanned index "idr_documento_index" to remove 11184520 row versions
> DETAIL: CPU 321.96s/968.75u sec elapsed 2004.91 sec
> INFO: "flip_pagina_edicao": removed 11184520 row versions in 3762529 pages
> DETAIL: CPU 165.65s/95.06u sec elapsed 647.49 sec
> INFO: scanned index "id_caderno" to remove 11184521 row versions
> DETAIL: CPU 175.89s/461.26u sec elapsed 1098.74 sec
> INFO: scanned index "ordem_index" to remove 11184521 row versions
> DETAIL: CPU 301.57s/887.26u sec elapsed 2052.44 sec
> INFO: scanned index "primary00024" to remove 11184521 row versions
> DETAIL: CPU 150.12s/461.92u sec elapsed 947.74 sec
> INFO: scanned index "ordem" to remove 11184521 row versions
> DETAIL: CPU 286.29s/896.03u sec elapsed 1954.21 sec
> INFO: scanned index "flip_pagina_edicao_pkey" to remove 11184521 row
> versions
> DETAIL: CPU 140.75s/423.38u sec elapsed 901.04 sec
> INFO: scanned index "flip_flippagedic_idcaderno" to remove 11184521 row
> versions
> DETAIL: CPU 455.24s/1043.27u sec elapsed 2551.54 sec
> INFO: scanned index "nomepdfindex" to remove 11184521 row versions
> DETAIL: CPU 644.78s/1163.80u sec elapsed 3469.67 sec
> INFO: scanned index "ocrindex" to remove 11184521 row versions
> DETAIL: CPU 25951.95s/18712.36u sec elapsed 49880.37 sec
> INFO: scanned index "idr_documento_index" to remove 11184521 row versions
> DETAIL: CPU 274.05s/710.37u sec elapsed 1799.98 sec
> INFO: "flip_pagina_edicao": removed 11184521 row versions in 2358457 pages
> DETAIL: CPU 68.83s/36.81u sec elapsed 353.40 sec
> INFO: scanned index "id_caderno" to remove 11184528 row versions
> DETAIL: CPU 156.47s/372.75u sec elapsed 1022.31 sec
> INFO: scanned index "ordem_index" to remove 11184528 row versions
> DETAIL: CPU 257.13s/669.86u sec elapsed 2057.05 sec
> INFO: scanned index "primary00024" to remove 11184528 row versions
> DETAIL: CPU 116.29s/319.86u sec elapsed 842.55 sec
> INFO: scanned index "ordem" to remove 11184528 row versions
> DETAIL: CPU 232.78s/562.22u sec elapsed 2676.35 sec
> INFO: scanned index "flip_pagina_edicao_pkey" to remove 11184528 row
> versions
> DETAIL: CPU 91.21s/241.00u sec elapsed 875.27 sec
> INFO: scanned index "flip_flippagedic_idcaderno" to remove 11184528 row
> versions
> DETAIL: CPU 285.71s/585.63u sec elapsed 2593.08 sec
> INFO: scanned index "nomepdfindex" to remove 11184528 row versions
> DETAIL: CPU 413.55s/692.77u sec elapsed 3843.78 sec
>
> Thank you.
> Sidney Pryor.
>
>
> ------------------------------
> *De:* Sidney Aloisio Ferreira Pryor <sidney(at)tjrj(dot)jus(dot)br>
> *Enviado:* terça-feira, 28 de julho de 2020 15:20
> *Para:* Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
> *Cc:* Ron <ronljohnsonjr(at)gmail(dot)com>; pgsql-admin(at)lists(dot)postgresql(dot)org <
> pgsql-admin(at)lists(dot)postgresql(dot)org>
> *Assunto:* RE: How can i be certain autovacuum is causing reuse if table
> still grows
>
> Thank you very much for this complete explanation and opinion, Keith.
>
> The first goal of publishing this mail on the list was to answer how to
> measure precisely free space: pgstattuple is the correct one.
> The second goal is running so far: manual vacuum to accomplish reuse of
> that large free area. (i will post a pgstattuple of the table when this
> vacuum ends)
> And the third goal is to free and downsize this huge allocation of space:
> schedule an outage to run vacuum full.
>
> Again i am truly grateful for your precious help.
> And i will post here Pg_total_relation_size before and after running
> vacuum full when we schedule that outage.
>
> Best Regards,
> Sidney Pryor
>
> ------------------------------
> *De:* Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
> *Enviado:* terça-feira, 28 de julho de 2020 10:55
> *Para:* Sidney Aloisio Ferreira Pryor <sidney(at)tjrj(dot)jus(dot)br>
> *Cc:* Ron <ronljohnsonjr(at)gmail(dot)com>; pgsql-admin(at)lists(dot)postgresql(dot)org <
> pgsql-admin(at)lists(dot)postgresql(dot)org>
> *Assunto:* Re: How can i be certain autovacuum is causing reuse if table
> still grows
>
> Again, I would caution against going with partitioning until you can first
> try and tune the table itself. I have quite a bit of experience with it as
> I'm the author of pg_partman so you can see I have nothing against
> partitioning itself, but I'm only for using it when it's necessary. If you
> decide to go that route, it would be a good tool to help you automate much
> of the maintenance process of partitioning either by time or integer.
>
> https://github.com/pgpartman/pg_partman
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpgpartman%2Fpg_partman&data=02%7C01%7C%7C11ac22b6cf6c4d55f16b08d832fdfa31%7Cce4e1164986f413285d11e3c17cf7d6e%7C0%7C0%7C637315413709487597&sdata=IRgSs%2BHwfoPRKJPj0cm9Ztn8ECq5MpqDHM0sQ2GGKKs%3D&reserved=0>
>
> I say this because, while it could possibly help with making vacuum more
> efficient, there are many caveats to partitioning. Since you're on 9.6,
> there is no native partitioning, so you'd have to do trigger based
> partitioning which has a HUGE impact on write performance (at least 2x
> slower if not worse). Even with native partitioning in 10+, there is a
> performance penalty during tuple routing and you would want to test that
> impact. Especially if you are doing updates which would move data across
> child partitions (updates that move data prior to PG10 are very, very
> complicated and not supported by pg_partman). Also, if you have any unique
> keys to maintain, they are not enforced across the partition set, and only
> supported in native if they are also part of the partition key itself.
> You're also just adding to the general complexity of your database overall.
>
> I would in general say the primary reason you would want to consider
> partitioning in PG is if you're trying to expire old data out. Dropping a
> table is much more efficient than running large/many DELETE statements due
> to the very issues you're encountering now: bloat. But it sounds like you
> are not expiring data, so you also have to take into account how many
> partitions you will have long term. It wasn't until PG12 that having over
> just even a few hundred partitions could have major performance impact
> overall. So, before even considering partitioning, you would want to get
> onto the latest major release.
>
> Partitioning can help with vacuum and general maintenance, but that's
> really only when the actual live data in a single table starts getting
> very, very large. And in that case you may want to first consider your
> hardware resources before going down the road of partitioning since that
> may be the better benefit long term. You've got nearly 2.3TB of space to
> clean up in this table, so, again, you want to solve that problem first. If
> you had that much free space before this vacuum finished, you likely have
> even more now after it finishes. And just note that a regular vacuum is
> likely not going to clean up this space. It may clean up some, but vacuum
> only returns space to the system in very specific circumstances (the
> tail-end pages are empty). You could use something like pg_repack or
> pg_squeeze to do this without a long outage, but with bloat this size, you
> may run into issues with how long those tools would need to run. Especially
> if the table is still in use. So, again, I would recommend an outage to do
> a VACUUM FULL which will lock and rewrite the table. I'd be curious to see
> what your total size of the entire table before and after is. The following
> function will give that total size (table + indexes)
>
> select pg_total_relation_size('public.mytable');
>
> So, while partitioning could possibly be the answer long term, I would not
> recommend it until you've cleaned up the existing bloat on this table and
> its indexes and try to tune autovacuum run more efficiently. If that works,
> you've saved yourself a great deal of complexity. And again, you'd
> definitely want to get on the latest major version of PG where partitioning
> has improved tremendously, so if a major version upgrade isn't in the works
> for a while, I wouldn't even consider it.
>
> --
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com
> <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcrunchydata.com%2F&data=02%7C01%7C%7C11ac22b6cf6c4d55f16b08d832fdfa31%7Cce4e1164986f413285d11e3c17cf7d6e%7C0%7C0%7C637315413709487597&sdata=9iMrCLn%2FI75gRd8u46t2VPVLFHlU2xIoEMggx%2FdhAb8%3D&reserved=0>
>
>
> On Tue, Jul 28, 2020 at 6:18 AM Sidney Aloisio Ferreira Pryor <
> sidney(at)tjrj(dot)jus(dot)br> wrote:
>
> Thank you Ron
> I would appreciate if you could share a good documentation or review link
> about partitioning tables on postgres.
>
> Enviado do meu iPhone
>
> Em 28 de jul de 2020, à(s) 02:16, Ron <ronljohnsonjr(at)gmail(dot)com> escreveu:
>
> 
> > All lines even old ones might be updated by our application.
>
> Ah. Even so, there are benefits to maintaining "lots of smaller chunks"
> instead of One Giant Table: "pg_dump --jobs=X" will run faster, for
> example, when there are more small-to-mid-sized tables instead of A Few
> Giant Tables. Ditto "reindexdb --jobs=X".
>
> Given that even "old" records get updated, I'd look into some other key
> that you can "naturally" partition on.
>
> On 7/27/20 10:54 PM, Sidney Aloisio Ferreira Pryor wrote:
>
> Thank you Ron.
> Yes there is a column id in which value is provided by a sequence.
>
> But it was not clear to me exactly how to partition?
> The idea is to cover a limited range of "id" so there will be less lines
> to vacuum on each operation?
>
> Because yet we will need eventually to vacuum all others.
> All lines even old ones might be updated by our application.
>
> ------------------------------
> *De:* Ron <ronljohnsonjr(at)gmail(dot)com> <ronljohnsonjr(at)gmail(dot)com>
> *Enviado:* terça-feira, 28 de julho de 2020 00:39
> *Para:* pgsql-admin(at)lists(dot)postgresql(dot)org
> <pgsql-admin(at)lists(dot)postgresql(dot)org> <pgsql-admin(at)lists(dot)postgresql(dot)org>
> *Assunto:* Re: How can i be certain autovacuum is causing reuse if table
> still grows
>
> On 7/27/20 5:19 PM, Keith Fiske wrote:
> [snip]
> > Also, I would not recommend partitioning simply to improve vacuuming.
> > Especially if extensive tuning hasn't been tried first. Most times you
> can
> > get per-table tuning working well enough to get autovacuum running
> > properly. Especially on 9.6 and even more-so on PG11, where autovacuum
> has
> > itself been improved.
>
> SIMPLY to improve vacuum performance? No. But there are reasons that
> partitioning was "invented", and minimizing the work needed to be done on
> the whole of a Very Large and Rapidly Growing table is one of them.
>
> --
> Angular momentum makes the world go 'round.
>
>
>
> --
> Angular momentum makes the world go 'round.
>
>
>
>

--
Guillaume.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Keith Fiske 2020-08-04 13:49:30 Re: How can i be certain autovacuum is causing reuse if table still grows
Previous Message Sidney Aloisio Ferreira Pryor 2020-08-03 21:36:10 RE: How can i be certain autovacuum is causing reuse if table still grows