Re: Recommendation to run vacuum FULL in parallel

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Recommendation to run vacuum FULL in parallel
Date: 2019-04-11 07:01:50
Message-ID: b35d52ac-6911-53b3-1aa0-f3fc32d6d02b@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Look also at pg_stat_all_tables.n_dead_tup for tables which are candidates
for vacuuming.

On 4/10/19 11:49 PM, Perumal Raj wrote:
> Thanks Kevin for the inputs,
>
> In my Case there are 500+ Tables and biggest chunk 30GB ( Table only) +
> its indexes.
> So i have created 6 batches and executed in parallel . All my scripts
> completed in 2 Hours and my DB size came down from 500GB to 300GB.
>
> Yes i do see CPU spike, But i did whole activity with full apps down time.
>
> Going forward i am going to run vacuum daily basis to maintain the DB size.
>
> Also Table/DB Age came down drastically.
>
> Thanks
> Raj
>
> On Thu, Apr 4, 2019 at 12:53 PM Kevin Brannen <KBrannen(at)efji(dot)com
> <mailto:KBrannen(at)efji(dot)com>> wrote:
>
> *From:* Perumal Raj <perucinci(at)gmail(dot)com <mailto:perucinci(at)gmail(dot)com>>
>
> **
>
> So conclude the requirement here , The only way to parallelism is
> multiple script. And no need to do REINDEX exclusively.
>
> Question : Do we need to consider  Table dependencies while preparing
> script in order to avoid table locks during vacuum full ?
>
> We have a small bash script (see below) that get the list of tables
> and their sizes, sorted smallest to largest, and do “vacuum full” one
> at a time because (as someone else pointed out) this is very I/O
> intensive. That order also helps to ensure we finish because some of
> our installs are at the edge of running out of space (an issue we’re
> dealing with). I probably wouldn’t have a problem doing 2 at a time,
> but we do this in the middle of the night when activity is lowest and
> it only takes 1-2 hours, so we’re good with it. It sounds like you
> have a lot more data though.
>
> You might also consider putting the data into different tablespaces
> which are spread over multiple disks to help I/O. If you can, use SSD
> drives, they help with speed quite a bit. 😊
>
> Don’t worry about table dependencies. This is a physical operation,
> not a data operation.
>
> HTH,
>
> Kevin
>
>     $PGPATH/psql -t -c "
>
>         WITH s AS (SELECT nspname || '.' || relname AS TABLE_NAME,
> pg_total_relation_size(c.oid) AS total_bytes
>
>                   FROM pg_class c
>
>                   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
>
>                   WHERE relkind = 'r' AND nspname NOT IN (
> 'pg_catalog', 'information_schema' )
>
>                   ORDER BY 2 )
>
>         SELECT table_name FROM s
>
>         " |
>
>     while read t ; do echo "" ; echo $t; $PGPATH/vacuumdb -w -z -f -t
> $t ; done
>
> ###
>
> This e-mail transmission, and any documents, files or previous e-mail
> messages attached to it, may contain confidential information. If you
> are not the intended recipient, or a person responsible for delivering
> it to the intended recipient, you are hereby notified that any
> disclosure, distribution, review, copy or use of any of the
> information contained in or attached to this message is STRICTLY
> PROHIBITED. If you have received this transmission in error, please
> immediately notify us by reply e-mail, and destroy the original
> transmission and its attachments without reading them or saving them
> to disk. Thank you.
>

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message rihad 2019-04-11 07:19:43 When do vacuumed pages/tuples become available for reuse?
Previous Message Perumal Raj 2019-04-11 04:49:42 Re: Recommendation to run vacuum FULL in parallel