Re: Strange results using pg_restore

From: Oscar Calderon <ocalderon(at)solucionesaplicativas(dot)com>
To: reiner peterke <zedaardv(at)drizzle(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Strange results using pg_restore
Date: 2013-10-18 17:23:51
Message-ID: CAOqBXCB-CjAC5=z97PzdDmH9ep8x87ArQodTys_0iKh4jUgrrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thank you Kevin Grittner, i was hoping to find an answer like that, because
honestly even when i'm studying a lot about PostgreSQL those days (i'm
planning to take Associate Certificate exam in a couple of weeks) i have no
previous experience in migration of database of that size (i've only
migrated some databases of about 20Mb from applications that we currently
host where i work of clients that are just starting working with us, for
those cases i used SQL backups) so i have doubts about those execution
times. Just as an extra info, that's my computer, and the production server
where i'm going to reinstall database and restore backups is even more
powerful and dedicated just to host PostgreSQL cluster so it will be faster:

IBM X3850 (that's the model but i'm not sure about the exact version)
RHEL 5.7
16GB RAM
Intel Xeon X7350 2.93Ghz (8 cores)
500 GB hard drive (but in a future they are planning to move the data to a
LUN, i don't have too much knowledgement about that but i think that i will
have to create a new tablespace in that LUN and then move all the objects
in a future)

Thank you also for the piece of advice about VACUUM FREEZE ANALYZE, i was
planning just to shutdown PostgreSQL service, then restore the
postgresql.conf tuned for normal operations (i will create a copy with
settings tuned for restoring and another one with general tuning) and start
the server, but i will run this before shutdown. Also i didn't know that it
will take some time in shutdown. Or maybe i will wait a couple of minutes
befure shutting down.

And thanks Reiner, as i said before i don't have previous migration
experience of this level so i was unsure, but i'm glad to hear about your
experience with bigger databases. I will check the command pgupgrade, i
didn't know about that, altough it isn't an option for me in this situation
because, they currently have RHEL 4, and they are forced to upgrade to RHEL
5.7 (because other backup technologies that they have in that server
doesn't let him to migrate to the last RHEL stable version) to continue
getting support from RHEL provider, but they don't want to upgrade, they
want to wipe the server and reinstall RHEL 5.7, so i have to backup
everything and extract it before that. and then install PG 9.3 and restore
backups.

Thank to everybody, i will share my experience tomorrow, even when isn't as
big as yours, but as a reference for other people in my situation.

Regards.

***************************
Oscar Calderon
Analista de Sistemas
Soluciones Aplicativas S.A. de C.V.
www.solucionesaplicativas.com
Cel. (503) 7741 7850 Tel. (503) 2522-2834

2013/10/18 reiner peterke <zedaardv(at)drizzle(dot)com>

> Hi Oscar,
>
> I've done some migration work on several of our databases, in the 150-200
> GB range. The times have been around 20-40 minuets.
> So, depending on the hardware your times seem to be reasonable.
>
> Regarding the smaller size, doing a pg_restore the tables and indexes will
> be of 'optimal' size. the restore will build the tables and indexes
> without any wasted space so there should be a reduction in size over the
> databases.
>
> check the number of rows and look at sample data. when i'm wanting to do
> a quick comparison i'll use a query like,
> select to_char('<timespamp_column>','yyyy-mm') dt, count(*) from <table>
> group by dt;
> and compare the results between the two.
>
> If you have the disk space i would recommend using the pg_upgrade command.
> http://www.postgresql.org/docs/9.3/interactive/pgupgrade.html
>
> reiner
>
> On 17 okt 2013, at 17:48, Oscar Calderon <
> ocalderon(at)solucionesaplicativas(dot)com> wrote:
>
> Have a good day. This friday i'm going to migrate an entire database of a
> government institution in my country. Those are like 4 database of 2GB each
> one. So, i was preparing about what i'm going to do tomorrow. They
> currently have PostgreSQL 9.1 installed from source i think, and they will
> update RHEL version to 5.7, it will be a reinstall of operative system, so
> before that i will backup the databases, copy contents of pg_hba.conf and
> so on, and after the reinstallation of RHEL i'm going to install PostgreSQL
> 9.3 from official repositories alongside contrib and plpython.
>
> I've already tested that on a Centos 5.7 VM and it works fine. But well,
> yesterday i was practicing creation of backups and restoring backups in my
> laptop with the next specs:
>
> Core i7 2.2 Ghz
> 8GB RAM
> 30GB empty space
> Windows 8.1
>
> In my laptop i also have PostgreSQL 9.3. First i used the sample database
> pgdellstore, what i did is that i created a database, and ran the SQL file
> of the database. After that, i created a backup of that database
> pgdellstore with pg_dump, a custom format, without compression:
>
> pg_dump -h localhost -p 5432 -U postgres -W -Fc -Z0 -C -d pgdellstore >
> pgdellstore.backup
>
> And everything as normal. After that i created a second database and
> restored my custom backup with pg_restore trying to speed up using -j
> option:
>
> pg_restore -h localhost -p 5432 -U postgres -W -d pgdellstore -j4
> pgdellstore.backup
>
> Also, before that, i backed up my postgresql.conf and edited the current
> one with the next parameters, according to some research that i did in
> google about performance tuning of pg_restore:
>
> shared_buffers=2GB
> maintenance_work_mem=1GB
> work_mem=128MB
> wal_buffers=16MB
> checkpoint_segments=8
> autovacuum=off
> archive_mode=off
> fsync=off
> full_page_writes=off
> checkpoint_timeout=15min
> checkpoint_completion_target=0.9
> track_counts=off
> synchronous_commit=off
> bgwriter_delay=50ms
>
> And it restored it in 2 seconds i think, for me it was extremely fast (is
> a database of 25MB aproximately) and i had doubts, so i have some queries
> to check database tables size and i compared the results of the first
> database (the one that i created with original sql script) and the second
> one (the one that i created using a custom backup of the first one) and i
> got this:
>
> DB created with original SQL script:
>
> schemaname | tablename | reltuples | tamanio | tamanioord
> ------------+--------------+-----------+------------+------------
> public | customers | 20000 | 5016 kB | 5136384
> public | orderlines | 60350 | 4440 kB | 4546560
> public | cust_hist | 60350 | 3976 kB | 4071424
> public | products | 10000 | 1552 kB | 1589248
> public | orders | 12000 | 1384 kB | 1417216
> public | inventory | 10000 | 704 kB | 720896
> public | categories | 16 | 24 kB | 24576
> public | afiliado | 4 | 24 kB | 24576
> public | pruebafechas | 0 | 8192 bytes | 8192
> public | reorder | 0 | 0 bytes | 0
> (10 filas)
>
> DB created with custom backup based on first db
>
> schemaname | tablename | reltuples | tamanio | tamanioord
> ------------+--------------+-----------+------------+------------
> public | customers | 20000 | 4992 kB | 5111808
> public | orderlines | 60350 | 4416 kB | 4521984
> public | cust_hist | 60350 | 3952 kB | 4046848
> public | products | 10000 | 1528 kB | 1564672
> public | orders | 12000 | 1360 kB | 1392640
> public | inventory | 10000 | 680 kB | 696320
> public | afiliado | 4 | 24 kB | 24576
> public | categories | 16 | 24 kB | 24576
> public | pruebafechas | 0 | 8192 bytes | 8192
> public | reorder | 0 | 0 bytes | 0
> (10 filas)
>
> This is the query that i used to get those results:
>
> SELECT tbl.schemaname, tbl.tablename, obj.reltuples
> , pg_size_pretty(pg_total_relation_size(text('"' || tbl.schemaname ||
> '"."' || tbl.tablename || '"'))) tamanio, pg_total_relation_size(text('"'
> || tbl.schemaname || '"."' || tbl.tablename || '"')) tamanioord
> FROM pg_tables tbl, pg_class obj WHERE tbl.tablename = obj.relname
> AND tbl.schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY
> tamanioord DESC;
>
> As you can see, there are certain variations in size, the number of tuples
> remains, but the fields tamanio (formatted size) and tamanioord (size
> without format) have a difference, for example the table customers has 5016
> kB versus 4992 kB . I'm afraid that in someway i lost data or something
> like that, or according to your experience that can be normal and doesn't
> mean that information is corrupted or something, because tomorrow when i
> will do that with productions databases it will be a chaos if information
> gets corrupted.
>
> Also, my second concern is the next: 4 months ago, i did a routine
> maintenance of those production database, and first as usual i backup all
> the databases. I took one of the backups of a development database for
> testing purposes. The backup file weights 279MB and the database is like
> 1.8 GB . So, yesterday i also practiced restoring this backup with
> pg_restore in a new database, with the same configuration from above. The
> command used was this:
>
> pg_restore -h localhost -p 5432 -U postgres -W --create -d debdbrest -j4
> devdb.backup
>
> And it took only 1:30 mins aproximately. For me is suspiciously fast
> becuse when i was reading about pg_restore and how to enhace performance
> during backup restoring, i found various cases in previous postgresql
> versions about pg_restore is too slow, and also, i tried the same restore
> of the same backups 2 days ago but without postgresql.conf parameters
> modified and without using -j flag, and it started and after 2 hours it
> hadn't finished, so i cancelled; also in the first try i noted that it
> printed a lot of output in the console and in this second try it didn't
> print anything.
>
> But that time reduction from more than 2 hours to 1:30 mins is too
> suspicious for me, but i don't know if, as the previous question that i
> have, you consider that this is normal (a backup of 279MB of a database of
> 1840MB restored in just 1.30 min).
>
> Regards.
>
> ***************************
> Oscar Calderon
> Analista de Sistemas
> Soluciones Aplicativas S.A. de C.V.
> www.solucionesaplicativas.com
> Cel. (503) 7741 7850 Tel. (503) 2522-2834
>
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Thara Vadakkeveedu 2013-10-21 16:18:18 CPU usage by sqls
Previous Message reiner peterke 2013-10-18 06:48:22 Re: Strange results using pg_restore