Re: Restarting DB after moving to another drive

From: Daniel Begin <jfd553(at)hotmail(dot)com>
To: "'Francisco Olarte'" <folarte(at)peoplecall(dot)com>
Cc: <rod(at)iol(dot)ie>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Restarting DB after moving to another drive
Date: 2015-05-15 19:17:03
Message-ID: COL129-DS12F0368DFE5B840909C1E094C70@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank for that comprehensive response!

And you are right about practicing restore, I never had to :-)

However, I use pg_dump on a regular basis (custom format) but I did not know the difference between database/database cluster (and pg_dumpall) until I had to move everything because the PGDATA drive started overheating.

Now that I better understand what is going on with backup/restore processes, and considering...
- The time it is going to take to rebuild the whole cluster ;
- That I am the only user of the database;
- That everything was just fine with the database, except the temperature of the drive
- And considering the initial concern of this tread was about bad copy of symbolic links with windows

I will make another attempt to copy everything on another drive from windows, unless someone tells me it is not possible.
- I will move my external tablespaces content back to pgdata and drop them for the time I copy the db to the new drive.
- Doing so, I will get rid of the symbolic link (from tablespaces) from where originated the initial error message
- Without symbolic links, I should be able to copy the db using standard windows commands.
- Setting up the new drive's letter to the old one before restarting the db is easy
-The whole process should take 12hours instead of a week.

Hoping it makes sense and that I have not missed something important (again)
Thank for your patience :-)
Daniel

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Francisco Olarte
Sent: May-15-15 12:20
To: Daniel Begin
Cc: rod(at)iol(dot)ie; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Restarting DB after moving to another drive

HI Daniel:

On Fri, May 15, 2015 at 5:35 PM, Daniel Begin <jfd553(at)hotmail(dot)com> wrote:
> Bonjour Francisco.
Buenos dias.

> Skimming the documentation sequentially is a cleaver advice,
> especially since the doc is much of the time well done and exhaustive.
> Unfortunately, even if I actually did it about 1 year ago, it seems
> this specific item slipped out of my mind :-(

Bad luck, you managed to get one of the most important commands, but it's solved now.

> About dump/restore operation, restoring the database cluster is running for 24hrs now (psql -f pgdumpallOutputfile postgres). Since it took 13hrs to dump the cluster, I begin to wonder how long it is going to take to restore it...

Much longer than this, especially if as you say below you have a lot of indexes. It's one of the reasons many of us do not use pg_dumpall for anything but global objects, but use something like the crontab which John R. Pierce posted ( I use a similar thing, but with an intermediate script with dumps critical databases more frequently, skips recreatable ( may be someone can confirm if that word is right ?
I mean test things which can be created from scratch ( as they come from a script ) ) databases and keeps several numbered copies ). Doing it this ways insures we can restore on criticality order if it needs to be done ( I even move unmodified partitions to a 'historic" schema, which gets dumped only after a change, which cuts my backups times to a tenth )

One thing. I do not know how you are restoring the database, but when doing this things we use a specially tuned postgresql.conf ( fsync off, minimal loging, lots of worrk mems and similar things, as we do not care about durability ( you can just rerun initdb and redo the restore, and there is only 1 session connected, the restoring one ).
This cuts the restore times to easily a tenth, then after ending it we restart the server with the normal cong. It is a must when doing this short of things.

> My main concern is about how the indexes are managed in dump/restore operations. I understand that pg_dumpall actually uses pg_dump where the doc says "Post-data items include definitions of indexes, triggers..." I would not worry if the doc said that indexes are simply copied but it says "includes definition of indexes".
> Since some of the indexes took days to build... does someone could confirm indexes are rebuilt instead of copied?
> If indexes are actually rebuilt, why should it be done that way? - There must be good reason!

You are out of luck, and it has a reason. First, pg_dumps does not copy, it dumps. It's simpler behaviour ( text output ) just output a SQL script which recreates everything and inserts all the data ( normally using copy for speed, but it's the same as inserting ). It takes care of generating a fast script ( meaning it creates the tables, then inserts the data, then creates indexes and reactivates constraints, which is faster than defining everything and inserting with indexes and constraints actives ).

The reason to do it in text mode is you can dump between different version and/or architectures, and also the dump is much smaller than the db, specially if you compress it ( I always do it, testing a bit you can always find a compresor with will lead to faster backups, as saved disk writing easily offsets compression times, specially in moder multicpu memory rich machines ). Bear in mind in many scenarios you backup a lot ( we dump some critical things hourly, even if we are using replication ) and restore nearly never, and prefer to use a couple days more for the restore than a couple hours of degraded performance every backup.

This being said, if you have an 820G db ( I still do not know which size is this, I suppose it's $PGDATA footprint ) of important data ( it does not seem critical in availability, as you are taking days and still in bussiness ) and you are having these kind of problems to dump and restore and move directories in your OS, and do not know how much time it takes for backups, you have a problem. You should practice backup AND restore more, because your question indicates you MAY be backing up your data, but you have never restored a backup.

Also, the text output format is really good for the global objects in pg_dumpall, but not so much for the normal databases. For this you should use the custom format, unless it is a really small db. The problem with it is it can only do a database per file, and needs pg_restore to be read ( I know those are minors ). The advantage is instead of generating a plain text dump it builds a kind of tar file with the definitions and data for every object clearly separated, so you can do partial restores or whatever thing you want ( in fact, without options and without connecting to the database pg_restore spits out the same text file that a text dump will generate ). If you had used this technique you could have restored your tables in order, or restored only the data and then reindexed them concurrently with some other ( performance degraded ) work. You can do the same thing by editing the text dump, but it gets impractical and really complex beyond a few megabytes.

regards.
Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message William Dunn 2015-05-15 23:36:52 Re: R: Index on integer or on string field
Previous Message Arthur Silva 2015-05-15 17:32:23 Re: R: Index on integer or on string field