Re: Is it possible to have a "fast-write" Index?

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: deavid <deavidsedice(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is it possible to have a "fast-write" Index?
Date: 2015-06-05 18:09:22
Message-ID: CAGTBQpZ6jMvshJ0u-=zEwYY4Lwh70Npp=H_EsYqMr6N+_6UDxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 5, 2015 at 2:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> deavid <deavidsedice(at)gmail(dot)com> writes:
>> Some people already asked for "delayed write" indexes, but the idea gets
>> discarded because the index could get out of sync, so it can omit results
>> and this is unacceptable. But i think maybe that could be fixed in several
>> ways and we can have a fast and reliable index (but maybe not so fast on
>> selects).
>
> FWIW, GIN indexes already implement something that's like your mode 2 but
> a bit better: there's an unordered "pending insertions" list that has to
> be scanned by every search in addition to checking the main index body.
> Every so often the pending insertions list is automatically flushed into
> the main index body.
>
> The reason we do this for GIN is that that index type puts a huge premium
> on doing inserts "in bulk"; it's a lot more efficient if you push many
> rows into the index at once, because frequently they'll be inserting into
> the same per-key posting lists. I do not see much opportunity for a
> corresponding gain for btree.

A forest of btrees (say mode 2.c) may not be a bad idea. When tables
grow consistently, the cost of I/O is usually high in FPW and random
I/O due to the large spread of index updates. I don't have numbers,
but on the databases I've handled it certainly was so.

If you have a btree_forest am that will consist of several btrees that
follow the GIN pattern only instead of an unordered list you have an
ordered btree (which also simplifies merging), you should gain a lot.

The big btrees will be read-only, so they will be compact (100% fill
rate), you will generate less WAL (updates are all local on the small
"staging" btree) and even the disk may perform better with that
pattern.

It is in fact a pattern used by inverted indexes already, so it
wouldn't be too far-fetched.

It is however hard to figure out when compaction has to happen.
Concurrency shouldn't be an issue though, since all but the smallest
btree would be read-only, so you only need a lock while modifying the
forest structure (adding a new btree, swapping components with merged
versions, etc).

It would indeed, though, require a lot of extra storage to perform
compaction. An alternative would be to implement compaction as a
massive insert/delete instead. Certainly, how exactly compaction gets
implemented would be key in deciding whether the approach breaks even.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-06-05 18:11:34 Re: Is it possible to have a "fast-write" Index?
Previous Message Tom Lane 2015-06-05 17:59:36 Re: Is it possible to have a "fast-write" Index?