Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

From: Kisung Kim <kskim(at)bitnine(dot)net>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Date: 2016-08-12 16:48:49
Message-ID: CABF0Rr3zGQbLM4v7ahyZdUDQ-+fNOpHrpv7Ea+Q2WMts7Uc1Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

You're right. Reindex improves the performance of the benchmark workloads
dramatically.
I'm gathering results and will announce them.

But I think we should notice that the results before Reindexing is poorer
than MongoDB.
It seems that this is because of Btree bloating (not exact expression).
The lookup performance for the Btree is most crucial for the results
because the workload is select for primary key.
So larger Btree could mean less cache hits and slower query performance.
I think that PG doesn't pack leaf node to 90% but half for future insertion
and because of this PG's btree is larger than MongoDB
(I turned off prefix compression of WiredTiger index and block compression
for storage.)
But MongoDB (actually WiredTiger) seems to do differently.

Is my speculation is right? I'm not sure because I didn't review the btree
code of PG yet.

And I want to point that the loading performance of MongoDB is better than
PG.
If PG leaves more space for future insertion, then could we get at least
faster loading performance?
Then can we conclude that we have more chances to improve Btree of PG?

Best Regards,

On Fri, Aug 12, 2016 at 5:40 PM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> After examining the benchmark design - I see we are probably not being
> helped by the repeated insertion of keys all of form 'userxxxxxxx' leading
> to some page splitting.
>
> However your index rebuild gets you from 5 to 3 GB - does that really help
> performance significantly?
>
> regards
>
> Mark
>
>
> On 11/08/16 16:08, Kisung Kim wrote:
>
>> Thank you for your information.
>> Here is the result:
>>
>> After insertions:
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>> version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> ---------+------------+------------+---------------+--------
>> --------+------------+-------------+---------------+--------
>> ----------+--------------------
>> 2 | 3 | 5488721920 | 44337 | 4464 |
>> 665545 | 0 | 0 | 52 | 11
>> (1 row)
>>
>> After rebuild:
>>
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>> version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> ---------+------------+------------+---------------+--------
>> --------+------------+-------------+---------------+--------
>> ----------+--------------------
>> 2 | 3 | 3154296832 | 41827 | 1899 |
>> 383146 | 0 | 0 | 90.08 |
>> 0
>>
>>
>> It seems like that rebuild has an effect to reduce the number of internal
>> and leaf_pages and make more dense leaf pages.
>>
>>
>>
>>
>

--

Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim(at)bitnine(dot)net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2016-08-12 16:55:06 Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Previous Message Robert Haas 2016-08-12 16:41:56 Re: Add hint for function named "is"