From: | Fabio Pardi <f(dot)pardi(at)portavita(dot)eu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | pgbench on pglogical: bandwidth usage |
Date: | 2018-06-18 10:38:35 |
Message-ID: | 08bdbb29-6e85-d07f-787f-5fe2e544831a@portavita.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I'm researchingon pglogical,performing tests to see the impact on the network traffic, in comparisonwith streaming replication.
I configured one provider and one subscriber both running on a VM, both on Postgres 9.6.3 and latest pglogical 2.2.0.
Forcomparison, Ialso have one master and one hot standby, running on the same VM and on the same Postgres.
I performed 2 different kind of tests. One using pgbench and another manually generating data. But there is something I do not understand when it comes to draw conclusions.Therefore I would appreciate yourhelp.
=Test 1=
I create a schema/table called:
my_replicated_schema.my_first_replicated_table
I ask to provider to add it to the replication set using:
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['my_replicated_schema']);
Then I generate 50 million +1 records on the table:
insert into my_replicated_schema.my_first_replicated_table values (generate_series(0,50000000));'
At this stage I can check how much traffic was generated to pass the records to the other node.
Logical replication costs:
RX bytes:10271325 (9.7 MiB) TX bytes:2465344586 (2.2 GiB)
While streaming replication:
RX bytes:23058328 (21.9 MiB) TX bytes:7502525286 (6.9 GiB)
Conclusion:
Pglogical is more convenient.
Nice.
= Test 2=
Same configuration used to test pglogical during test 1.Accidentally, Idid not change the replication set. So the replication set was still using: 'my_replicated_schema.my_first_replicated_table'
Pgbench instead writes to 'public'schema.
I theninitialize pgbench tables on the provider, using:
pgbench -i
which results in:
\dt public.
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+-------
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
(4 rows)
I then run pgbench on the provider:
pgbench -c 3 -P 10 -r -T 200 -R 1000
And with big surprise, Ican thensee that an amount of traffic was generated:
RX bytes:35500329 (33.8 MiB) TX bytes:66376459 (63.3 MiB)
This is very strange to me. Running similar tests, where on the provider I manually push data on a table which is not in the replication set, no traffic was generated.
There must be an explanation for what is happening, and I think is more related to pgbench than pglogical, but i cannot find it. Do you have pointers?
= Test 3 =
Provider is replicating the schema public, and subscriber receiving it.
I then run pgbenchsame way as above:
pgbench -c 3 -P 10 -r -T 200 -R 1000
But I get results which are contradicting test 1.
Pglogical replication uses:
RX bytes:69783187 (66.5 MiB) TX bytes:371664060 (354.4 MiB)
While streaming replication:
RX bytes:86286353 (82.2 MiB) TX bytes:281693815 (268.6 MiB)
Here, one would say that streaming replication is cheaper..
Also I cannot explain why is that. Do you?
Side notes:
- All tests ran over multiple times, producing identical results
- I also ran a test similar to 'Test1' but updating results (instead of running 'INSERT'), which as expected gave same outcome as 'Test 1'
- At every run Idestroy 'pgbench' database and recreate it, in order to start as clean as possible.
- As cross check, Im also checking that traffic wise,the numbers that appear on the provider are specular on the subscriber
- Here Ireport the exact commands I ran in order to reproduce the test beds for pglogical:
Test 1 and 2:
Provider called 'dbfabio':
PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db -c "CREATE extension pglogical;"
psql $pglog_db -c "CREATE schema my_replicated_schema"
psql $pglog_db -c "CREATE table my_replicated_schema.my_first_replicated_table (i int primary key)"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 'provider.$pglog_db', dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.replication_set_add_all_tables('default', ARRAY['my_replicated_schema']);"
Subscriber called 'dbfabio2':
PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db -c "CREATE extension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 'subscriber.$pglog_db', dsn := 'host=dbfabio2 port=5432 dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.create_subscription(subscription_name := 'subscription_to_dbfabio_$pglog_db', synchronize_structure := true, provider_dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
Test 3:
Provider:
PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
pgbench -i
# Now a small hack, since pglogical only accepts tables who have a primary key. pgbench_historical does not have it, out of the box: (maybe here there is some room for an improvement to propose for pgbench code? what do you think?)
psql $pglog_db -c "ALTER TABLE pgbench_history ADD COLUMN id SERIAL PRIMARY KEY;"
psql $pglog_db -c "CREATEextension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 'provider.$pglog_db', dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);"
Subscriber:
Same as test 1
Streaming replication setup looks like:
Master:
wal_level = hot_standby
Standby:
hot_standby = on
+ recovery.conf
any help is appreciated.
regards,
fabio pardi
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-06-18 12:11:01 | Re: Slow planning time for simple query |
Previous Message | Abhinav Singh | 2018-06-18 05:16:18 | Replication using PGLogical |