Re: Millions of tables

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
Cc: julyanto(at)equnix(dot)co(dot)id, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Millions of tables
Date: 2016-09-29 08:21:39
Message-ID: CANP8+jKE0A1s6UK3F+EmPEmv9b_zWJ6T-qsMGvUJyEfLCZ7HTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 26 September 2016 at 05:19, Greg Spiegelberg <gspiegelberg(at)gmail(dot)com> wrote:
> I did look at PostgresXL and CitusDB. Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency. At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index. I have other requirements that prohibit
> use of any technology that is eventually consistent.

Then XL is exactly what you need, since it does allow you to calculate
exactly where the record is via hash and then access it, which makes
the request just a single datanode task.

XL is not the same as CitusDB.

> I liken the problem to fishing. To find a particular fish of length, size,
> color &c in a data lake you must accept the possibility of scanning the
> entire lake. However, if all fish were in barrels where each barrel had a
> particular kind of fish of specific length, size, color &c then the problem
> is far simpler.

The task of putting the fish in the appropriate barrel is quite hard.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alex Ignatov (postgrespro) 2016-09-29 11:11:07 Re: Millions of tables
Previous Message Karl Denninger 2016-09-28 19:06:35 Re: PostgreSQL on ZFS: performance tuning