From: | Önder Kalacı <onderkalaci(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com> |
Cc: | Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date: | 2023-03-01 13:21:52 |
Message-ID: | CACawEhWbq207=dDvRzteiXoAtGUS=eTcdkBEJ+Lh-bKkohFqYg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Andres, Amit, Shi Yu, all
Andres Freund <andres(at)anarazel(dot)de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu
yazdı:
> Hi,
>
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci(at)gmail(dot)com>
> wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
>
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
>
>
You are right, looking closely, in fact, we most of the time do much less
tuples_equal() with index scan.
I've done some profiling with perf, and created flame graphs for the apply
worker, with the
test described above: *-- case 1 (All values are duplicated). *I used the
following commands:
- perf record -F 99 -p 122555 -g -- sleep 60
- perf script | ./stackcollapse-perf.pl > out.perf-folded
- ./flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg
I attached both flame graphs. I do not see anything specific regarding what
the patch does, but
instead the difference mostly seems to come down to index scan vs
sequential scan related
functions. As I continue to investigate, I thought it might be useful to
share the flame graphs
so that more experienced hackers could comment on the difference.
Regarding my own end-to-end tests: In some runs, the sequential scan is
indeed faster for case-1. But,
when I execute *update tbl set a=a+1; *for 50 consecutive times, and
measure end to end performance, I see
much better results for index scan, only case-1 is on-par as mostly I'd
expect.
Case-1, running the update 50 times and waiting all changes applied
- index scan: 2minutes 36 seconds
- sequential scan: 2minutes 30 seconds
Case-2, running the update 50 times and waiting all changes applied
- index scan: 1 minutes, 2 seconds
- sequential scan: 2minutes 30 seconds
Case-7, running the update 50 times and waiting all changes applied
- index scan: 6 seconds
- sequential scan: 2minutes 26seconds
> # Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):
Shi Yu, could it be possible for you to re-run the tests with some more
runs, and share the average?
I suspect maybe your test results have a very small pool size, and some
runs are making
the average slightly problematic.
In my tests, I shared the total time, which is probably also fine.
Thanks,
Onder
Attachment | Content-Type | Size |
---|---|---|
perf_index_scan.svg | image/svg+xml | 90.5 KB |
perf_seq_scan.svg | image/svg+xml | 59.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | gkokolatos | 2023-03-01 13:39:14 | Re: Add LZ4 compression in pg_dump |
Previous Message | Mikhail Gribkov | 2023-03-01 13:12:14 | Re: On login trigger: take three |