Re: Logical Replication Speed

From: Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>
To: Andrew Kerber <andrew(dot)kerber(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Logical Replication Speed
Date: 2021-09-29 15:29:40
Message-ID: CAFpL5Vw6sq6D=-0MNUkwF2P+qnFiW5zZnCu_T7et88N71jYDsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Andrew,

Our hardware at the destination is pretty good where all the write is
happening

Source
CPU -20
Mem -180GB

pg_test_fsync -s 5

5 seconds per test

O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 92.364 ops/sec 10827 usecs/op

fdatasync 92.047 ops/sec 10864 usecs/op

fsync 45.893 ops/sec 21790 usecs/op

fsync_writethrough n/a

open_sync 45.849 ops/sec 21811 usecs/op

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 33.290 ops/sec 30039 usecs/op

fdatasync 90.455 ops/sec 11055 usecs/op

fsync 45.314 ops/sec 22068 usecs/op

fsync_writethrough n/a

open_sync 22.740 ops/sec 43976 usecs/op

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write

open_sync sizes.)

1 * 16kB open_sync write 45.280 ops/sec 22085 usecs/op

2 * 8kB open_sync writes 20.371 ops/sec 49088 usecs/op

4 * 4kB open_sync writes 11.346 ops/sec 88140 usecs/op

8 * 2kB open_sync writes 5.334 ops/sec 187477 usecs/op

16 * 1kB open_sync writes 2.671 ops/sec 374454 usecs/op

Test if fsync on non-write file descriptor is honored:

(If the times are similar, fsync() can sync data written on a different

descriptor.)

write, fsync, close 45.814 ops/sec 21828 usecs/op

write, close, fsync 41.236 ops/sec 24251 usecs/op

Non-sync'ed 8kB writes:

write 202147.625 ops/sec 5 usecs/op

Destination
CPU - 60
Mem - 350GB

pg_test_fsync -s 5

5 seconds per test

O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 28146.149 ops/sec 36 usecs/op

fdatasync 26721.581 ops/sec 37 usecs/op

fsync 23500.918 ops/sec 43 usecs/op

fsync_writethrough n/a

open_sync 25170.058 ops/sec 40 usecs/op

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 14390.340 ops/sec 69 usecs/op

fdatasync 19655.191 ops/sec 51 usecs/op

fsync 17476.174 ops/sec 57 usecs/op

fsync_writethrough n/a

open_sync 12840.007 ops/sec 78 usecs/op

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write

open_sync sizes.)

1 * 16kB open_sync write 19287.311 ops/sec 52 usecs/op

2 * 8kB open_sync writes 12786.778 ops/sec 78 usecs/op

4 * 4kB open_sync writes 7527.042 ops/sec 133 usecs/op

8 * 2kB open_sync writes write failed: Invalid argument

On Wed, Sep 29, 2021 at 8:02 PM Andrew Kerber <andrew(dot)kerber(at)gmail(dot)com>
wrote:

> The speed is going to be governed by the available hardware, so improve
> the hardware to improve the speed of the initial data copy. You should also
> be able to add space to the file system for your wal logs without too much
> difficulty in RHEL7.
>
> On Wed, Sep 29, 2021 at 9:26 AM Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>
> wrote:
>
>> Hi Team,
>>
>> We have a 10 TB database most of which is occupied by a single table.
>> This table is around 9.5TB. Bloat is 3.5TB for this table. So, the total
>> table size without bloat is 6TB.
>>
>> We are using native logical replication to remove the bloat for this
>> table to another instance but it takes more than a day to copy 1.5TB of
>> data. With this speed it will take 4-5 days for the initial sync itself.
>> During this initial sync, the replication slot holds WAL and creates disk
>> usage problems on the primary, so we had to stop the replication. We cannot
>> remove the PK since it is the Replica identity.Other indexes and FK have
>> been removed from the destination(secondary).
>>
>> Other changes we have made in the destination(secondary) instance but
>> still no luck
>> - fsync=off,synchronous_commit=off, archive_mode=off.
>>
>> Since it is a single table, we cannot increase worker_processes. Source
>> database is highly transactional and we do not need any downtime.
>>
>> Is there any other option to increase the speed of initial data copy in
>> logical replication?
>>
>> Postgres Version - 11.7
>> OS - OEL7
>>
>> Thanks and Regards,
>> Nikhil
>>
>
>
> --
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Sergio Alonso 2021-09-30 16:42:36 Barman setup
Previous Message Andrew Kerber 2021-09-29 14:32:31 Re: Logical Replication Speed