Quick Links

brininsert optimization opportunity

From:	Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Ashwin Agrawal <ashwinstar(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject:	brininsert optimization opportunity
Date:	2023-07-03 22:21:58
Message-ID:	CAE-ML+9r2=aO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello hackers,

My colleague, Ashwin, pointed out to me that brininsert's per-tuple init
of the revmap access struct can have non-trivial overhead.

Turns out he is right. We are saving 24 bytes of memory per-call for
the access struct, and a bit on buffer/locking overhead, with the
attached patch.

The implementation ties the revmap cleanup as a MemoryContext callback
to the IndexInfo struct's MemoryContext, as there is no teardown
function provided by the index AM for end-of-insert-command.

Test setup (local Ubuntu workstation):

# Drop caches and restart between each run:
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches;"
pg_ctl -D /usr/local/pgsql/data/ -l /tmp/logfile restart

\timing
DROP TABLE heap;
CREATE TABLE heap(i int);
CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
INSERT INTO heap SELECT 1 FROM generate_series(1, 200000000);

Results:
We see an improvement for 100M tuples and an even bigger improvement for
200M tuples.

Master (29cf61ade3f245aa40f427a1d6345287ef77e622):

test=# INSERT INTO heap SELECT 1 FROM generate_series(1, 100000000);
INSERT 0 100000000
Time: 222762.159 ms (03:42.762)

-- 3 runs
test=# INSERT INTO heap SELECT 1 FROM generate_series(1, 200000000);
INSERT 0 200000000
Time: 471168.181 ms (07:51.168)
Time: 457071.883 ms (07:37.072)
TimeL 486969.205 ms (08:06.969)

Branch:

test2=# INSERT INTO heap SELECT 1 FROM generate_series(1, 100000000);
INSERT 0 100000000
Time: 200046.519 ms (03:20.047)

-- 3 runs
test2=# INSERT INTO heap SELECT 1 FROM generate_series(1, 200000000);
INSERT 0 200000000
Time: 369041.832 ms (06:09.042)
Time: 365483.382 ms (06:05.483)
Time: 375506.144 ms (06:15.506)

# Profiled backend running INSERT of 100000000 rows
sudo perf record -p 11951 --call-graph fp sleep 180

Please see attached perf diff between master and branch. We see that we
save on a bit of overhead from brinRevmapInitialize(),
brinRevmapTerminate() and lock routines.

Regards,
Soumyadeep (VMware)

Attachment	Content-Type	Size
perf_diff.out	application/octet-stream	16.3 KB
v1-0001-Reuse-revmap-and-brin-desc-in-brininsert.patch	text/x-patch	4.3 KB

Responses

Re: brininsert optimization opportunity at 2023-07-04 11:23:58 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2023-07-03 22:29:02	Re: Add information about command path and version of flex in meson output
Previous Message	Tom Lane	2023-07-03 22:20:18	Re: Size vs size_t or, um, PgSize?