Quick Links

Re: Millions of tables

From:	Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To:	Mike Sofen <msofen(at)runbox(dot)com>
Cc:	"pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Millions of tables
Date:	2016-09-26 04:05:18
Message-ID:	CAEtnbpX1wr5xnZXyA8pKKytZrGW8yMucKY4ap99ng92BPzW1BQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Precisely why I shared with the group. I must understand the risks
involved. I need to explore if it can be stable at this size when does it
become unstable? Aside from locking down user access to superuser, is
there a way to prohibit database-wide VACUUM & ANALYZE? Certainly putting
my trust in autovacuum :) which is something I have not yet fully explored
how to best tune.

Couple more numbers... ~231 GB is the size of PGDATA with 8M empty tables
and 16M empty indexes. ~5% of inodes on the file system have been used.
Sar data during the 8M table creation shows a very stable and regular I/O
pattern. Not a blip worth mentioning.

Another point worth mentioning, the tables contain a boolean, int8's and
timestamptz's only. Nothing of variable size like bytea, text, json or
xml. Each of the 8M tables will contain on the very high side between 140k
and 200k records. The application also has a heads up as to which table
contains which record. The searches come in saying "give me record X from
partition key Y" where Y identifies the table and X is used in the filter
on the table.

Last point, add column will never be done. I can hear eyes rolling :) but
the schema and it's intended use is complete. You'll have to trust me on
that one.

-Greg

On Sun, Sep 25, 2016 at 9:23 PM, Mike Sofen <msofen(at)runbox(dot)com> wrote:

> *From:* Greg Spiegelberg *Sent:* Sunday, September 25, 2016 7:50 PM
> … Over the weekend, I created 8M tables with 16M indexes on those tables.
>
> … A system or database crash could take potentially hours to days to
> recover. There are likely other issues ahead.
>
>
>
> You may wonder, "why is Greg attempting such a thing?" I looked at
> DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face it,
> it's antiquated and don't get me started on "Hadoop". Problem with the
> "one big table" solution is I anticipate 1,200 trillion records. Random
> access is expected and the customer expects <30ms reads for a single record
> fetch.
>
>
>
> I'm not looking for alternatives yet but input to my test.
>
> _________
>
>
>
> Holy guacamole, batman! Ok, here’s my take: you’ve traded the
> risks/limitations of the known for the risks of the unknown. The unknown
> being, in the numerous places where postgres historical development may
> have cut corners, you may be the first to exercise those corners and flame
> out like the recent SpaceX rocket.
>
>
>
> Put it another way – you’re going to bet your career (perhaps) or a
> client’s future on an architectural model that just doesn’t seem feasible.
> I think you’ve got a remarkable design problem to solve, and am glad you’ve
> chosen to share that problem with us.
>
>
>
> And I do think it will boil down to this: it’s not that you CAN do it on
> Postgres (which you clearly can), but once in production, assuming things
> are actually stable, how will you handle the data management aspects like
> inevitable breakage, data integrity issues, backups, restores, user
> contention for resources, fault tolerance and disaster recovery. Just
> listing the tables will take forever. Add a column? Never. I do think
> the amount of testing you’ll need to do prove that every normal data
> management function still works at that table count…that in itself is going
> to be not a lot of fun.
>
>
>
> This one hurts my head. Ironically, the most logical destination for this
> type of data may actually be Hadoop – auto-scale, auto-shard, fault
> tolerant, etc…and I’m not a Hadoopie.
>
>
>
> I am looking forward to hearing how this all plays out, it will be quite
> an adventure! All the best,
>
>
>
> Mike Sofen (Synthetic Genomics…on Postgres 9.5x)
>

In response to

Re: Millions of tables at 2016-09-26 03:23:28 from Mike Sofen

Responses

Re: Millions of tables at 2016-09-26 05:04:52 from Gavin Flower

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Spiegelberg	2016-09-26 04:19:21	Re: Millions of tables
Previous Message	Mike Sofen	2016-09-26 03:23:28	Re: Millions of tables