| From: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> | 
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Error while copying a large file in pg_rewind | 
| Date: | 2017-07-03 11:22:47 | 
| Message-ID: | CAGz5QC+8gbkz=Brp0TgoKNqHWTzonbPtPex80U0O6Uh_bevbaA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello all,
pg_rewind throws the following error when there is a file of large
size available in the Slave server's data directory.
unexpected result while sending file list: ERROR:  value "2148000000"
is out of range for type integer
CONTEXT:  COPY fetchchunks, line 2402, column begin: "2148000000"
How to reproduce
----------------------------
1. Set up replication between Server A(master) and Server B(slave)
2. Promote the slave server(Server B )
3. Stop the old master (Server A)
4. Create a large file in the newly promoted master's (Server B) data
directory using the below command
dd if=/dev/zero of=large.file bs=1024 count=4000000
     [root(at)localhost data]# dd if=/dev/zero of=large.file bs=1024 count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 8.32263 s, 492 MB/s
5. Execute pg_rewind command from old master(server A)
     ./pg_rewind -D /home/enterprisedb/master/ --debug --progress
--source-server="port=5661 user=enterprisedb dbname=edb"
IMHO, it seems to be a bug in pg_rewind.
As mentioned in pg_rewind documentation, there are few files which are
copied in whole.
"Copy all other files such as pg_xact and configuration files from the
source cluster to the target cluster (everything except the relation
files)." -- https://www.postgresql.org/docs/devel/static/app-pgrewind.html
Those files are copied in max CHUNKSIZE(default 1000000) bytes at a
time. In the process, pg_rewind creates a table with the following
schema and loads information about blocks that need to be copied.
CREATE TEMPORARY TABLE fetchchunks(path text, begin int4, len int4);
postgres=# select * from fetchchunks where begin != 0;
                  path                   |  begin   |   len
-----------------------------------------+----------+---------
 pg_wal/000000010000000000000002         |  1000000 | 1000000
 pg_wal/000000010000000000000002         |  2000000 | 1000000
 pg_wal/000000010000000000000002         |  3000000 | 1000000
 pg_wal/000000010000000000000002         |  4000000 | 1000000
......
and so on.
The range for begin is between -2147483648 to +2147483647. For a 4GB
file, begin definitely goes beyond 2147483647 and it throws the
following error:
unexpected result while sending file list: ERROR:  value "2148000000"
is out of range for type integer
CONTEXT:  COPY fetchchunks, line 2659, column begin: "2148000000"
I guess we've to change the data type to bigint. Also, we need some
implementation of ntohl() for 8-byte data types. I've attached a
script to reproduce the error and a draft patch.
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
| Attachment | Content-Type | Size | 
|---|---|---|
| standby-server-setup.sh | application/x-sh | 2.7 KB | 
| fix_copying_large_file_pg_rewind_v1.patch | application/x-download | 2.1 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fabien COELHO | 2017-07-03 11:37:30 | Re: WIP Patch: Pgbench Serialization and deadlock errors | 
| Previous Message | Ashutosh Bapat | 2017-07-03 11:13:23 | Re: A bug in mapping attributes in ATExecAttachPartition() |