Re: Is it possible to have a "fast-write" Index?

From: deavid <deavidsedice(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is it possible to have a "fast-write" Index?
Date: 2015-06-20 19:39:21
Message-ID: CAFR-75vVeU=fZrOf7kZEVyZcm67pq7oL6MmBks_Oy8S0hanH_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

El vie., 19 jun. 2015 a las 15:06, Simon Riggs (<simon(at)2ndquadrant(dot)com>)
escribió:

> It doesn't say anything about their being only one index buffer per table,
> nor do I think it would make sense to do it that way. So ISTM that the
> foreground process still has to insert serially into N index buffers, with
> each insert being WAL logged.
>
> So the only saving for the foreground process is the random I/O from
> inserting into the indexes, which means the value of the technique is in
> the case where you have many very large secondary indexes - which is now
> covered by BRIN.
>

I'm still learning how postgresql, but, you're assuming when inserting in
bulk into an insert would require the same amount of CPU cycles and the
same amount of kB written compared to doing it row-by-row.

Most memory-based indexes have a bulk load technique that relies in having
the data pre-sorted. Sorting pure random data and then bulk-inserting it
into the index is faster than the classic insertion. (less CPU time, no
idea about IO)

Database indexes are disk-based and there are some points (regarding IO
performance) that are hard for me to fully understand. But seems logic that
would be faster to scan the index only once from begin to end and do
something like a "merge sort" between pre-sorted input and the index.

So I guess I missed something. Maybe is WAL logging the problem? If so,
could this work for TEMP/UNLOGGED tables?

Lots of tables that are heavily written are materialized views (or they
perform more or less the same), so they could be refreshed in case of
server failure. I hope bulk inserts could double the performance;
otherwise, this
idea may not be worth it.

About BRIN indexes, i'm really impressed. They are several times faster
than I could imagine. Also, on select they perform very well. I have to
test them more, with more complex queries (they would work when used on
JOIN clauses?). If select times are good enough even in those cases, then
there's no need for doing bulk-inserts with btree.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-06-20 21:52:03 Re: pg_stat_*_columns?
Previous Message Alvaro Herrera 2015-06-20 18:35:22 Re: Insufficient locking for ALTER DEFAULT PRIVILEGES