Re: Does the work made by vaccum in the current pass is lost when interrupted?

From: Allan Kamau <kamauallan(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Does the work made by vaccum in the current pass is lost when interrupted?
Date: 2020-10-09 11:31:43
Message-ID: CAF3N6oSuBs58xme8pjGpr=+TqNEB_QTUHW2pNv4n9KX+AbXwMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 8, 2020 at 10:16 PM Jean-Marc Lessard <
Jean-Marc(dot)Lessard(at)ultra-ft(dot)com> wrote:

> I have a large table (billions of records) which has not been vacuum and
> bloated.
>
> Vacuum scale factor was left at the default.
>
>
>
> I ran a vacuum on a DEV system and it makes several passes (scanning
> heap/vacuuming indexes/vacumming heap) which take more than an hour each.
>
> On a PROD system, I may have to kill the job midway.
>
> Should I reduce the autovacuum_work_mem of my session? Currently 1GB
>
> Does the work made in the current pass is lost when interrupted?
>
> Correct me if I am not right, but vacuumed indexes and heap made by the
> current pass will go to disk, but the heap should be rescan on the next
> vacuum.
>
> I guess that the visibility map is updated only at the end of each pass.
>
>
>
> My best regards
>
>
>
> Thank you
>
>
>
>
> *Jean-Marc Lessard*
> Administrateur de base de données / Database Administrator
>
>
>
> Ultra Electronics Forensic Technology Inc.
> T +1 514 489 4247 x4164
> www.ultra-forensictechnology.com
>
>
>

A suggestion, considering that you have a table with billions of tuples,
you could look into table partitioning, see "
https://www.postgresql.org/docs/current/ddl-partitioning.html".

First you may need to identify a criteria by which to partition your data,
maybe by record population timestamp binned into yyyymm resolution.

One a test environment, you may construct a table similar to the one
currently having the many records but with a different name.

Then write code which will construct the child tables, run the sql to
construct the child tables.
If your child tables are based on the record date, you may construct child
tables for the timestamps of the record you have as well as several months
or timepoints in the future. Maybe have a cron job to construct new tables
of future timepoints.

Then write code to populate the new tables directly in piecemeal by the use
of WHERE clause with data from the current production table.

You may choose to write the above code to use "COPY" or "INSERT" to
populate the specific partition table.

Clone your current application and modify the code such that it inserts
directly to the specific child table or leave the writing to the specific
child table to be done by the on insert trigger.

Test the data population via the application to see if the child tables are
being populated accordingly.

If all is well. Schedule downtime where you can implement these changes to
your production environment.

Allan.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2020-10-09 14:47:33 Re: Does the work made by vaccum in the current pass is lost when interrupted?
Previous Message Tony Shelver 2020-10-09 09:34:30 Re: What's your experience with using Postgres in IoT-contexts?