From: | Önder Kalacı <onderkalaci(at)gmail(dot)com> |
---|---|
To: | |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date: | 2022-07-12 13:36:45 |
Message-ID: | CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
It is often not feasible to use `REPLICA IDENTITY FULL` on the
publication, because it leads to full table scan
per tuple change on the subscription. This makes `REPLICA IDENTITY
FULL` impracticable -- probably other
than some small number of use cases.
With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index
as long as the planner sub-modules picks any index over sequential scan.
Majority of the logic on the subscriber side has already existed in
the code. The subscriber is already
capable of doing (unique) index scans. With this patch, we are
allowing the index to iterate over the
tuples fetched and only act when tuples are equal. The ones familiar
with this part of the code could
realize that the sequential scan code on the subscriber already
implements the `tuples_equal()` function.
In short, the changes on the subscriber are mostly combining parts of
(unique) index scan and
sequential scan codes.
The decision on whether to use an index (or which index) is mostly
derived from planner infrastructure.
The idea is that on the subscriber we have all the columns. So,
construct all the `Path`s with the
restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ...
AND col_n = $N`. Finally, let
the planner sub-module -- `create_index_paths()` -- to give us the
relevant index `Path`s. On top of
that adds the sequential scan `Path` as well. Finally, pick the
cheapest `Path` among.
From the performance point of view, there are few things to note.
First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second,
when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the
difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim a
certain number of improvements.
It mostly depends on the data size, index and the data distribution.
Still, below I try to showcase the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.
// init source db
pgbench -i -s 100 -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;" -p 5432 postgres
psql -c "CREATE INDEX i1 ON pgbench_accounts(aid);" -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts REPLICA IDENTITY FULL;" -p 5432 postgres
psql -c "CREATE PUBLICATION pub_test_1 FOR TABLE pgbench_accounts;" -p
5432 postgres
// init target db, drop existing primary key
pgbench -i -p 9700 postgres
psql -c "truncate pgbench_accounts;" -p 9700 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;" -p 9700 postgres
psql -c "CREATE SUBSCRIPTION sub_test_1 CONNECTION 'host=localhost
port=5432 user=onderkalaci dbname=postgres' PUBLICATION pub_test_1;"
-p 9700 postgres
// create one index, even on a low cardinality column
psql -c "CREATE INDEX i2 ON pgbench_accounts(bid);" -p 9700 postgres
// now, run some pgbench tests and observe replication
pgbench -t 500 -b tpcb-like -p 5432 postgres
What do hackers think about this change?
Thanks,
Onder Kalaci & Developing the Citus extension for PostgreSQL
Attachment | Content-Type | Size |
---|---|---|
0001_use_index_on_subs_when_pub_rep_ident_full.patch | application/octet-stream | 39.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-07-12 13:37:41 | Re: Making Vars outer-join aware |
Previous Message | Drouvot, Bertrand | 2022-07-12 12:58:27 | Re: Patch proposal: New hooks in the connection path |