Re: Millions of tables

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
Cc: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Millions of tables
Date: 2016-09-26 09:43:20
Message-ID: CADmi=6O0CAJYdMw7Q863nLo_BKWR9n57xhLSw3LzNmGHMtkg0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 26 September 2016 at 11:19, Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
wrote:

> I did look at PostgresXL and CitusDB. Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency. At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index. I have other requirements that prohibit
> use of any technology that is eventually consistent.
>
> I liken the problem to fishing. To find a particular fish of length,
> size, color &c in a data lake you must accept the possibility of scanning
> the entire lake. However, if all fish were in barrels where each barrel
> had a particular kind of fish of specific length, size, color &c then the
> problem is far simpler.
>
> -Greg
>

My gut tells me that if you do solve the problem and get PostgreSQL (or
anything) reading consistently at under 30ms with that many tables you will
have solved one problem by creating another.

You discounted Cassandra due to network latency, but are now trying a
monolithic PostgreSQL setup. It might be worth trying a single node
ScyllaDB or Cassandra deploy (no need for QUORUM or network overhead),
perhaps using layered compaction so all your data gets broken out into
160MB chunks. And certainly wander over to the ScyllaDB mailing list, as
they are very focused on performance problems like yours and should offer
some insight even if a Cassandra style architecture cannot meet your
requirements.

An alternative if you exhaust or don't trust other options, use a foreign
data wrapper to access your own custom storage. A single table at the PG
level, you can shard the data yourself into 8 bazillion separate stores, in
whatever structure suites your read and write operations (maybe reusing an
embedded db engine, ordered flat file+log+index, whatever).

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2016-09-26 10:23:58 Re: Millions of tables
Previous Message Álvaro Hernández Tortosa 2016-09-26 08:28:54 Re: Millions of tables