Fastest option to transfer db?

From: Israel Brewster <ijbrewster(at)alaska(dot)edu>
To: PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Fastest option to transfer db?
Date: 2021-09-13 17:56:35
Message-ID: 9DD71F3B-4E46-4024-A7A6-0EDCE17F1251@alaska.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a database that I want to transfer from one VM to another. Both VM’s are running on the same (ProxMox) host. select pg_size_pretty(pg_database_size(‘dbname')); shows the database size to be 336GB. What is going to be the fastest method to transfer this data?

- The database cluster has other databases on it, so taking the cluster offline and copying files is not an option
- The specific database I want to transfer is live, receiving data on an ongoing basis. Some downtime is acceptable, days of downtime not so much.

My first attempt was to run the following command from the destination machine:

pg_dump -C -h source.machine.address dbname | psql -h 127.0.0.1 postgres

This command pegged a single core on both machines, and produced network activity of around 30M (according to the proxmox monitor). By my calculations that implies that the transfer should take around 25 hours - not really good enough, and just feels sluggish, given that even just a gig connection should be able to transfer data at near gig speeds - obviously there is a bottleneck somewhere in the system/command.

For my next attempt, I tried the following command from the SOURCE machine:

pg_dump -Fc -Cc -U israel dbname | pg_restore -U israel -h dest.machine.address -d postgres

This resulted in 100% CPU usage on the source machine by pg_dump, around 50% cpu usage on the source by postmaster, and around 30-50% cpu usage on the destination by postgres (I think postmaster vs postgres process name difference is due to CentOS vs Ubuntu? Not sure. Probably doesn’t matter.) Network throughput dropped to only 14M under this scenario, but of course that is compressed data, so time calculations are hard.

During both attempts I saw no disk activity on the destination host (according to proxmox monitoring), nor did the RAM usage increase, so I’m not sure where the data is actually going, which bothers me.

Is there a better way to do this?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shubham Mittal 2021-09-13 17:59:49 Re: Query takes around 15 to 20 min over 20Lakh rows
Previous Message Tom Lane 2021-09-13 16:04:48 Re: ERROR: control reached end of function without RETURN, except, it is not a function it is a procedure or a nameless block.