From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Gather performance analysis |
Date: | 2021-09-15 07:26:25 |
Message-ID: | CAFiTN-uNByjDK3+_q129NpZBALJngL8p1Kj=JAdqp585DvRQQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 8, 2021 at 4:41 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
Based on various suggestions, I have some more experiments with the patch.
1) I have measured the cache misses count and I see a ~20% reduction
in cache misses with the patch (updating shared memory counter only
after we written certain amount of data).
command: perf stat -e
cycles,instructions,cache-references,cache-misses -p <receiver-pid>
Head:
13,918,480,258 cycles
21,082,968,730 instructions # 1.51 insn per
cycle
13,206,426 cache-references
12,432,402 cache-misses # 94.139 % of all
cache refs
Patch:
14,119,691,844 cycles
29,497,239,984 instructions # 2.09 insn per
cycle
4,245,819 cache-references
3,085,047 cache-misses # 72.661 % of all cache refs
I have taken multiple samples with different execution times, and I
can see the cache-misses with the patch is 72-74% whereas without the
patch it is 92-94%. So as expected these results clearly showing we
are saving a lot by avoiding cache misses.
2) As pointed by Tomas, I have tried different test cases, where this
patch can regress the performance
CREATE TABLE t (a int, b varchar);
INSERT INTO t SELECT i, repeat('a', 200) from generate_series(1,200000000) as i;
set enable_gathermerge=off;
Query: select * from t1 where a < 100000 order by a;
Plan:
Sort (cost=1714422.10..1714645.24 rows=89258 width=15)
-> Gather (cost=1000.00..1707082.55 rows=89258 width=15)
-> Parallel Seq Scan on t1 (cost=0.00..1706082.55
rows=22314 width=15)
Filter: (a < 100000)
So the idea is, that without a patch we should immediately get the
tuple to the sort node whereas with a patch there would be some delay
before we send the tuple to the gather node as we are batching. With
this also, I did not notice any consistent regression with the patch,
however, with explain analyze I have noticed 2-3 % drop with the
patch.
3. I tried some other optimizations, pointed by Andres,
a) Separating read-only and read-write data in shm_mq and also moving
some fields out of shm_mq
struct shm_mq (after change)
{
/* mostly read-only field*/
PGPROC *mq_receiver;
PGPROC *mq_sender;
bool mq_detached;
slock_t mq_mutex;
/* read-write fields*/
pg_atomic_uint64 mq_bytes_read;
pg_atomic_uint64 mq_bytes_written;
char mq_ring[FLEXIBLE_ARRAY_MEMBER];
};
Note: mq_ring_size and mq_ring_offset moved to shm_mq_handle.
I did not see any extra improvement with this idea.
4. Another thought about changing the "mq_ring_size" to a mask
- I think this could improve something, but currently, "mq_ring_size"
is not the 2's power value so we can not convert this to a mask
directly.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2021-09-15 07:42:54 | Re: resowner module README needs update? |
Previous Message | Michael Paquier | 2021-09-15 07:09:00 | Re: resowner module README needs update? |