Re: LSM tree for Postgres

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LSM tree for Postgres
Date: 2020-08-04 17:18:01
Message-ID: 3a181d40-659b-2160-eaba-ca94067900ee@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.08.2020 18:11, Tomas Vondra wrote:
> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
>> Hi hackers,
>>
>> I want to share results of my last research of implementing LSM index
>> in Postgres.
>> Most of modern databases (RocksDB, MongoDB, Tarantool,...) are using
>> LSM tree instead of classical B-Tree.
>>
>
> I was under the impression that LSM is more an alternative primary
> storage, not for indexes. Or am I wrong / confused?

Yes, originally I considered LSM for IOT (index organized table).
And RocksDB FDW is actually such implementation of IOT.
But implement IOT using existed nbtree indexes is more challenging task:
I have thought about it but have not tried it yet.

> Interesting, although those are just writes, right? Do you have any
> numbers for read? Also, what are the numbers when you start with "larger
> than RAM" data (i.e. ignoring the initial period when the index fits
> into memory)?

When benchmark starts, insertion speed is almost the same for Lsm3 and
standard nbtree
(we have to insert record twice, but second insertion s done in background).
At the end of benchmark - when we close to 250 million records, Lsm3
shows TPS about 20 times faster than nbtree.
Finally it gives about 6 times difference in elapsed time.

> Yeah, I think in general FDW to a database with different consistency
> model is not going to get us very far ... Good for PoC experiments, but
> not really designed for stuff like this. Also, in my experience FDW has
> siginficant per-row overhead.

From my experience the main drawback of FDW is lack of support of
parallel operations.
But it is important mostly for OLAP, not for OLTP.

> BTW how would your approach work with unique indexes, speculative
> inserts etc.?

Unique indexes are not supported now.
And I do not see some acceptable solution here.
If we will have to check presence of duplicate at the time of insert
then it will eliminate all advantages of LSM approach.
And if we postpone to the moment of merge, then... I afraid that it will
be too late.

>
> Isn't it a bit suspicious that with more clients the throughput actually
> drops significantly? Is this merely due to PoC stage, or is there some
> inherent concurrency bottleneck?
>
My explaination is the following (I am not 100% sure that it is true):
multiple clients insert records faster than merge bgworker is able to
merge them to main index. It cause blown of top index and as a result it
doesn't fir in memory any more.
So we loose advantages of fast inserts. If we have N top indexes instead
of just 2, we can keep size of each top index small enough.
But in this case search operations will have to merge N indexes and so
search is almost N times slow (the fact that each top index fits in memory
doesn't mean that all of the fits in memory at the same time, so we
still have to read pages from disk during lookups in top indexes).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-08-04 17:21:18 Re: LSM tree for Postgres
Previous Message Konstantin Knizhnik 2020-08-04 16:56:55 Re: LSM tree for Postgres