Re: Gather performance analysis

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Gather performance analysis
Date: 2021-09-15 07:26:25
Message-ID: CAFiTN-uNByjDK3+_q129NpZBALJngL8p1Kj=JAdqp585DvRQQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 8, 2021 at 4:41 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

Based on various suggestions, I have some more experiments with the patch.

1) I have measured the cache misses count and I see a ~20% reduction
in cache misses with the patch (updating shared memory counter only
after we written certain amount of data).
command: perf stat -e
cycles,instructions,cache-references,cache-misses -p <receiver-pid>
Head:
13,918,480,258 cycles
21,082,968,730 instructions # 1.51 insn per
cycle
13,206,426 cache-references
12,432,402 cache-misses # 94.139 % of all
cache refs

Patch:
14,119,691,844 cycles
29,497,239,984 instructions # 2.09 insn per
cycle
4,245,819 cache-references
3,085,047 cache-misses # 72.661 % of all cache refs

I have taken multiple samples with different execution times, and I
can see the cache-misses with the patch is 72-74% whereas without the
patch it is 92-94%. So as expected these results clearly showing we
are saving a lot by avoiding cache misses.

2) As pointed by Tomas, I have tried different test cases, where this
patch can regress the performance

CREATE TABLE t (a int, b varchar);
INSERT INTO t SELECT i, repeat('a', 200) from generate_series(1,200000000) as i;
set enable_gathermerge=off;
Query: select * from t1 where a < 100000 order by a;

Plan:
Sort (cost=1714422.10..1714645.24 rows=89258 width=15)
-> Gather (cost=1000.00..1707082.55 rows=89258 width=15)
-> Parallel Seq Scan on t1 (cost=0.00..1706082.55
rows=22314 width=15)
Filter: (a < 100000)

So the idea is, that without a patch we should immediately get the
tuple to the sort node whereas with a patch there would be some delay
before we send the tuple to the gather node as we are batching. With
this also, I did not notice any consistent regression with the patch,
however, with explain analyze I have noticed 2-3 % drop with the
patch.

3. I tried some other optimizations, pointed by Andres,
a) Separating read-only and read-write data in shm_mq and also moving
some fields out of shm_mq

struct shm_mq (after change)
{
/* mostly read-only field*/

PGPROC *mq_receiver;
PGPROC *mq_sender;
bool mq_detached;
slock_t mq_mutex;

/* read-write fields*/
pg_atomic_uint64 mq_bytes_read;
pg_atomic_uint64 mq_bytes_written;
char mq_ring[FLEXIBLE_ARRAY_MEMBER];
};

Note: mq_ring_size and mq_ring_offset moved to shm_mq_handle.

I did not see any extra improvement with this idea.

4. Another thought about changing the "mq_ring_size" to a mask
- I think this could improve something, but currently, "mq_ring_size"
is not the 2's power value so we can not convert this to a mask
directly.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-09-15 07:42:54 Re: resowner module README needs update?
Previous Message Michael Paquier 2021-09-15 07:09:00 Re: resowner module README needs update?