Re: Clustered index to preserve data locality in a multitenant application?

From: Nicolas Grilly <nicolas(at)gardentechno(dot)com>
To: Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, Vick Khera <vivek(at)khera(dot)org>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Clustered index to preserve data locality in a multitenant application?
Date: 2016-09-02 15:18:23
Message-ID: CAG3yVS5DL-DJyLpM-Df8yUhE9MuKhuJu8d_kQ752XGqKxDrxdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Sep 1, 2016 at 3:08 PM, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:

> Don’t know about plans to implement clustered indexes in PostgreSQL.
>

It was discussed on the mailing list in the past.

I found an interesting thread dated from 2012 about integrating pg_reorg
(the ancestor of pg_repack) in PostgreSQL core:

https://www.postgresql.org/message-id/flat/CAB7nPqTGmNUFi%2BW6F1iwmf7J-o6sY%2Bxxo6Yb%3DmkUVYT-CG-B5A%40mail.gmail.com

There is also an item titled "Automatically maintain clustering on a table"
in the TODO list:

https://wiki.postgresql.org/wiki/Todo#CLUSTER

Not sure if this was mentioned, MS SQL Server has clustered indexes, where
> heap row is just stored on the leaf level of the index.
>
> Oracle also has similar feature: IOT, Index Organized Table.
>
>
>
> It seems to me (may be I’m wrong), that in PostgreSQL it should be much
> harder to implement clustered index (with the heap row stored in the index
> leaf) because of the way how MVCC implemented: multiple row versions are
> stored in the table itself (e.g. Oracle for that purpose keeps table
> “clean” and stores multiple row versions in UNDO tablespace/segment).
>
>
DB2, like PostgreSQL, stores rows in a heap, and not in the leafs of a
Btree. But it's possible to define a "clustering" key for a table. When it
is defined, DB2 tries to keep the rows in the heap ordered according to the
clustering key. If DB2 can’t find space on the page where the row should
go, then it looks a few pages before and after and puts it there, and if it
still can’t find space then puts it at the end. There is also a feature
called "multidimensional clustering" which is even more sophisticated.
There is also a command REORG, which would be the equivalent of a
non-blocking CLUSTER in PostgreSQL.

I think DB2's approach is interesting because it shows that maintaining
spatial coherency is possible with a heap, without having to store rows in
a Btree (like InnoDB).

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-09-02 15:19:03 Re: PG_MODULE_MAGIC issue with small extension
Previous Message David Gibbons 2016-09-02 15:16:00 Re: 2.5TB Migration from SATA to SSD disks - PostgreSQL 9.2