Re: Clustered index to preserve data locality in a multitenant application?

From: Nicolas Grilly <nicolas(at)vocationcity(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Cc: spam_eater(at)gmx(dot)net, Lunar Lander <msofen(at)runbox(dot)com>, Eduardo Morras <emorrasg(at)yahoo(dot)es>, Vick Khera <vivek(at)khera(dot)org>, Kenneth Marshall <ktm(at)rice(dot)edu>, Ben Chobot <bench(at)silentmedia(dot)com>, Igor Neyman <ineyman(at)perceptron(dot)com>
Subject: Re: Clustered index to preserve data locality in a multitenant application?
Date: 2016-09-06 16:21:33
Message-ID: CAG3yVS6=99PmvRC8Wvb6HsFjNZVOtLLcz5riAipifCVXsTD3zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Aug 30, 2016 at 1:10 PM, Nicolas Grilly <nicolas(at)vocationcity(dot)com>
wrote:

> We are developing a multitenant application which is currently based on
> MySQL, but we're thinking of migrating to PostgreSQL.
>
> We rely on clustered indexes to preserve data locality for each tenant.
> Primary keys start with the tenant ID. This way, rows belonging to the same
> tenant are stored next to each other. Because all requests hit only one
> tenant, this is a great performance improvement.
>
> PostgreSQL doesn't have clustered indexes — I'm aware of the CLUSTER
> command but it's a one-time operation — and I'm wondering if this can be a
> problem or not.
>
> Let's say we have a table containing data for 10,000 tenants and 10,000
> rows per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block
> contains ~10 rows. Let's way we want to compute the sum of an integer
> column for all rows belonging to a given tenant ID.
>
> In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To compute
> the sum, MySQL has to read at least 1,000 blocks (each block containing ~10
> rows). I deliberately neglect the cost of walking the B-tree intermediate
> nodes.
>
> By comparison, PostgreSQL has to read at least 10,000 blocks (each block
> containing ~10 rows, but most of the time, only one row will match the
> tenant ID, other rows belonging to other tenants).
>
> A few questions:
>
> - Am I missing something?
> - Am I overestimating the benefit of a clustered index in our case, and
> the cost of not having one in PostgreSQL?
> - Is there another technical solution to this problem?
>

After a few days thinking and researching about this, I've come to the
following conclusion:

1) Trying to preserve data locality in a multi-tenant application is a
legitimate requirement (because it can significantly improve performance).
2) But index organized tables are not the right answer.
3) Partitioning by tenant_id, sorting heap rows by tenant_id, and/or
creating covering indexes (for index-only scans) are better solutions.

I found at least two examples of well-known companies using PostgreSQL at
scale for a multi-tenant application, and taking specific steps to preserve
data locality:

- Heap shards its data by customer_id (all rows in a logical shard belong
to the same customer — except for small customers, but it's easy to make
their queries fast anyway) [1].
- Instagram uses pg_reorg (the ancestor of pg_repack) to keep likes created
by the same user contiguous on disk [2].

At first, I thought that index organized tables (aka clustered indexes)
were the solution, and that missing them in PostgreSQL could be an issue,
but I was wrong. They are not the right tool for the job.

Index organized tables are a powerful tool for an application that needs
very fast lookups and range scans on the primary key. But they also have
significant drawbacks [3]:

- Lookups on secondary keys are slower (because they need one more
indirection).
- The index is bigger (because rows are stored directly in the index,
instead of a heap).
- InnoDB can only use the primary key as clustering key, which is very
restrictive (for example, in PostgreSQL, most recently inserted/updated
rows are naturally clustered together, independently of the chosen primary
key).

So, PostgreSQL uses heap organized tables instead of index organized
tables, and this is a good thing, at least for the kind of queries my
application needs. But, at scale, I still need to find a way to preserve
data locality for each tenant.

The solutions I've identified are:

- Partition by tenant_id as suggested by Thomas Kellerer earlier in this
thread. Declarative partitioning will make this easier in a future version
of PostgreSQL. It's also possible to "shard" using Citus Data (like Heap or
CloudFlare).
- Periodically sort rows by tenant_id in the heap, using something like
pg_repack, as suggested by Kenneth Marshall and Ben Chobot.
- Create covering indexes, which let PostgreSQL do index-only scans
(exactly like an index organized table), as suggested by Eduardo Morras and
engineers at Heap [4].

It looks like I can move forward with our migration from MySQL to
PostgreSQL, without worrying about the lack of clustered indexes, because
there are better solutions to keep tenant data contiguous!

Thanks for all your comments.

Nicolas Grilly

[1] https://news.ycombinator.com/item?id=12412216
[2] http://instagram-engineering.tumblr.com/post/40781627982
/handling-growth-with-postgres-5-tips-from
[3] https://dzone.com/articles/unreasonable-defaults-primary
[4] "Powering Heap", https://www.youtube.com/watch?v=NVl9_6J1G60

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Attila Soki 2016-09-06 16:40:15 Re: pgadmin4 rc1 query tool performance
Previous Message Vick Khera 2016-09-06 15:51:22 Re: 2.5TB Migration from SATA to SSD disks - PostgreSQL 9.2