From: | "r(dot)takahashi_2(at)fujitsu(dot)com" <r(dot)takahashi_2(at)fujitsu(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout |
Date: | 2019-09-02 04:42:55 |
Message-ID: | OSBPR01MB4550DAE2F8C9502894A45AAB82BE0@OSBPR01MB4550.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in following environment.
[Environment]
Postgres 13dev (master branch)
Red Hat Enterprise Postgres 7.4
[Error]
$ pg_basebackup -F t --progress --verbose -h <hostname> -D <directory>
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5A000060 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_15647"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[Analysis]
- pg_basebackup -F t creates a tar file and does fsync() for each tablespace.
(Otherwise, -F p does fsync() only once at the end.)
- While doing fsync() for a tar file for one tablespace, wal sender sends the content of the next tablespace.
When fsync() spends long time, the tcp socket of pg_basebackup returns "zero window" packets to wal sender.
This means the tcp socket buffer of pg_basebackup is exhausted since pg_basebackup cannot receive during fsync().
- The socket of wal sender retries to send the packet, but resets connection after tcp_user_timeout.
After wal sender resets connection, pg_basebackup cannot receive data and fails with above error.
[Solution]
I think fsync() for each tablespace is not necessary.
Like pg_basebackup -F p, I think fsync() is necessary only once at the end.
Could you give me any comment?
Regards,
Ryohei Takahashi
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2019-09-02 04:50:42 | Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity |
Previous Message | Tom Lane | 2019-09-02 04:29:55 | Re: safe to overload objectSubId for a type? |