From: | Bernd Helmle <mailings(at)oopsware(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [PATCH] Add sortsupport for range types and btree_gist |
Date: | 2022-11-30 17:25:25 |
Message-ID: | 77f4bacbeb9d223f90b3a0bd55ef7d65407bfddd.camel@oopsware.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
No deep code review yet, but CF is approaching its end and i didn't
have time to look at this earlier :/
Below are some things i've tested so far.
Am Mittwoch, dem 15.06.2022 um 12:45 +0200 schrieb Christoph Heiss:
> Testing was done using following setup, with about 50 million rows:
>
> CREATE EXTENSION btree_gist;
> CREATE TABLE t (id uuid, block_range int4range);
> CREATE INDEX ON before USING GIST (id, block_range);
> COPY t FROM '..' DELIMITER ',' CSV HEADER;
>
> using
>
> SELECT * FROM t WHERE id = '..' AND block_range && '..'
>
> as test query, using a unpatched instance and one with the patch
> applied.
>
> Some stats for fetching 10,000 random rows using the query above,
> 100 iterations to get good averages.
>
Here are my results with repeating this:
HEAD:
-- token index (buffering=auto)
CREATE INDEX Time: 700213,110 ms (11:40,213)
HEAD patched:
-- token index (buffering=auto)
CREATE INDEX Time: 136229,400 ms (02:16,229)
So index creation speed on the test set (table filled with the tokens
and then creating the index afterwards) gets a lot of speedup with this
patch and default buffering strategy.
> The benchmarking was done on a unpatched instance compiled using the
> > exact same options as with the patch applied.
> > [ Results are noted in a unpatched -> patched fashion. ]
> >
> > First set of results are after the initial CREATE TABLE, CREATE
> INDEX
> > and a COPY to the table, thereby incrementally building the index.
> >
> > Shared Hit Blocks (average): 110.97 -> 78.58
> > Shared Read Blocks (average): 58.90 -> 47.42
> > Execution Time (average): 1.10 -> 0.83 ms
> > I/O Read Time (average): 0.19 -> 0.15 ms
I've changed this a little and did the following:
CREATE EXTENSION btree_gist;
CREATE TABLE t (id uuid, block_range int4range);
COPY t FROM '..' DELIMITER ',' CSV HEADER;
CREATE INDEX ON before USING GIST (id, block_range);
So creating the index _after_ having loaded the tokens.
My configuration was:
shared_buffers = 4G
max_wal_size = 6G
effective_cache_size = 4g # (default, index fits)
maintenance_work_mem = 1G
Here are my numbers from the attached benchmark script
HEAD -> HEAD patched:
Shared Hit Blocks (avg) : 76.81 -> 9.17
Shared Read Blocks (avg): 0.43 -> 0.11
Execution Time (avg) : 0.40 -> 0.05
IO Read Time (avg) : 0.001 -> 0.0007
So with these settings i see an improvement with the provided test set.
Since this patches adds sortsupport for all other existing opclasses, i
thought to give it a try with another test set. What i did was to adapt
the benchmark script (see attached) to use the "pgbench_accounts" table
which i changed to instead using the primary key to have a btree_gist
index on column "aid".
I let pgbench fill its tables with scale = 1000, dropped the primary
key, create the btree_gist on "aid" with default buffering strategy:
pgbench -s 1000 -i bernd
ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey ;
CREATE INDEX ON pgbench_accounts USING gist(aid);
Ran the benchmark script bench-gist-pgbench_accounts.py:
The numbers are:
HEAD -> HEAD patched
Shared Hit Blocks (avg) : 4.85 -> 8.75
Shared Read Blocks (avg): 0.14 -> 0.17
Execution Time (avg) : 0.01 -> 0.05
IO Read Time (avg) : 0.0003 -> 0.0009
So numbers got worse here. You can uncover this when using pgbench
against that modified table in a much more worse outcome.
Running
pgbench -s 1000 -c 16 -j 16 -S -Mprepared -T 300
on my workstation at least 3 times gives me the following numbers:
HEAD:
tps = 215338.784398 (without initial connection time)
tps = 212826.513727 (without initial connection time)
tps = 212102.857891 (without initial connection time)
HEAD patched:
tps = 126487.796716 (without initial connection time)
tps = 125076.391528 (without initial connection time)
tps = 124538.946388 (without initial connection time)
So this doesn't look good. While this patch gets a real improvement for
the provided tokens, it makes performance for at least int4 on this
test worse. Though the picture changes again if you build the index
buffered:
tps = 198409.248911 (without initial connection time)
tps = 194431.827394 (without initial connection time)
tps = 195657.532281 (without initial connection time)
which is again close to current HEAD (i have no idea why it is even
*that* slower, since "buffered=on" shouldn't employ sortsupport, no?).
Of course, built time for the index in this case is much slower again:
-- pgbench_accounts index (buffered)
CREATE INDEX Time: 900912,924 ms (15:00,913)
So while providing a huge improvement on index creation speed it's
sometimes still required to carefully check the index quality.
[...]
> Most of the sortsupport for btree_gist was implemented by re-using
> already existing infrastructure. For the few remaining types (bit,
> bool,
> cash, enum, interval, macaddress8 and time) I manually implemented
> them
> directly in btree_gist.
> It might make sense to move them into the backend for uniformity, but
> I
> wanted to get other opinions on that first.
Hmm i'd say we leave them in the contrib module until they are required
somewhere else, too or make a separate patch for them? Do we have plans
to have such requirement in the backend already?
Attached is a rebased patch against current HEAD.
Thanks
Bernd
Attachment | Content-Type | Size |
---|---|---|
bench-gist-pgbench_accounts.py | text/x-python3 | 1.6 KB |
image/png | 42.6 KB | |
image/png | 46.1 KB | |
image/png | 40.2 KB | |
image/png | 46.2 KB | |
benchmark.pgsql.config | text/plain | 170 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-11-30 17:31:55 | Re: New docs chapter on Transaction Management and related changes |
Previous Message | Alvaro Herrera | 2022-11-30 17:20:22 | Re: New docs chapter on Transaction Management and related changes |