Clustered index to preserve data locality in a multitenant application?

From: Nicolas Grilly <nicolas(at)gardentechno(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Clustered index to preserve data locality in a multitenant application?
Date: 2016-08-30 11:12:33
Message-ID: CAG3yVS4UMAivqq+kms1d=JHgm-aGr1T8YeQa1jAt7htg1q2UwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

We are developing a multitenant application which is currently based on
MySQL, but we're thinking of migrating to PostgreSQL.

We rely on clustered indexes to preserve data locality for each tenant.
Primary keys start with the tenant ID. This way, rows belonging to the same
tenant are stored next to each other. Because all requests hit only one
tenant, this is a great performance improvement.

PostgreSQL doesn't have clustered indexes — I'm aware of the CLUSTER
command but it's a one-time operation — and I'm wondering if this can be a
problem or not.

Let's say we have a table containing data for 10,000 tenants and 10,000
rows per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block
contains ~10 rows. Let's way we want to compute the sum of an integer
column for all rows belonging to a given tenant ID.

In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To compute
the sum, MySQL has to read at least 1,000 blocks (each block containing ~10
rows). I deliberately neglect the cost of walking the B-tree intermediate
nodes.

By comparison, PostgreSQL has to read at least 10,000 blocks (each block
containing ~10 rows, but most of the time, only one row will match the
tenant ID, other rows belonging to other tenants).

A few questions:

- Am I missing something?
- Am I overestimating the benefit of a clustered index in our case, and the
cost of not having one in PostgreSQL?
- Is there another technical solution to this problem?

Thanks,
Nicolas Grilly
Managing Partner
+33 6 03 00 25 34
www.vocationcity.com • Recruitment & application tracking software
www.gardentechno.com • Consulting & software engineering

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2016-08-30 11:39:03 Re: Clustered index to preserve data locality in a multitenant application?
Previous Message Nicolas Grilly 2016-08-30 11:10:07 Clustered index to preserve data locality in a multitenant application?