From: | Greg Spiegelberg <gspiegelberg(at)gmail(dot)com> |
---|---|
To: | Stuart Bishop <stuart(at)stuartbishop(dot)net> |
Cc: | "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Millions of tables |
Date: | 2016-09-26 13:51:10 |
Message-ID: | CAEtnbpVwn9qLuqWF3M5jE28v0AFpOHtoKowEr=uYgHovV-QncQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Sep 26, 2016 at 3:43 AM, Stuart Bishop <stuart(at)stuartbishop(dot)net>
wrote:
> On 26 September 2016 at 11:19, Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
> wrote:
>
>> I did look at PostgresXL and CitusDB. Both are admirable however neither
>> could support the need to read a random record consistently under 30ms.
>> It's a similar problem Cassandra and others have: network latency. At this
>> scale, to provide the ability to access any given record amongst trillions
>> it is imperative to know precisely where it is stored (system & database)
>> and read a relatively small index. I have other requirements that prohibit
>> use of any technology that is eventually consistent.
>>
>> I liken the problem to fishing. To find a particular fish of length,
>> size, color &c in a data lake you must accept the possibility of scanning
>> the entire lake. However, if all fish were in barrels where each barrel
>> had a particular kind of fish of specific length, size, color &c then the
>> problem is far simpler.
>>
>> -Greg
>>
>
> My gut tells me that if you do solve the problem and get PostgreSQL (or
> anything) reading consistently at under 30ms with that many tables you will
> have solved one problem by creating another.
>
>
Exactly why I am exploring. What are the trade offs?
> You discounted Cassandra due to network latency, but are now trying a
> monolithic PostgreSQL setup. It might be worth trying a single node
> ScyllaDB or Cassandra deploy (no need for QUORUM or network overhead),
> perhaps using layered compaction so all your data gets broken out into
> 160MB chunks. And certainly wander over to the ScyllaDB mailing list, as
> they are very focused on performance problems like yours and should offer
> some insight even if a Cassandra style architecture cannot meet your
> requirements.
>
>
Cassandra performance, according to the experts I consulted, starts to fall
off once the stored dataset exceeds ~3 TB. Much too small for my use
case. Again, I do have other reasons for not using Cassandra and others
namely deduplication of information referenced by my millions of tables.
There are no guarantees in many outside of the RDBMS realm.
> An alternative if you exhaust or don't trust other options, use a foreign
> data wrapper to access your own custom storage. A single table at the PG
> level, you can shard the data yourself into 8 bazillion separate stores, in
> whatever structure suites your read and write operations (maybe reusing an
> embedded db engine, ordered flat file+log+index, whatever).
>
>
However even 8 bazillion FDW's may cause an "overflow" of relationships at
the loss of having an efficient storage engine acting more like a traffic
cop. In such a case, I would opt to put such logic in the app to directly
access the true storage over using FDW's.
-Greg
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Spiegelberg | 2016-09-26 13:53:16 | Re: Millions of tables |
Previous Message | Mike Sofen | 2016-09-26 13:05:01 | Re: Millions of tables |