Re: Millions of tables

From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Mike Sofen <msofen(at)runbox(dot)com>, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Millions of tables
Date: 2016-09-28 17:38:42
Message-ID: CAEtnbpXMbsaWGF1jByqC0nK9DGS4_7mvHjRYQnL_cWVzT7ov2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Sep 28, 2016 at 11:27 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greg,
>
> * Greg Spiegelberg (gspiegelberg(at)gmail(dot)com) wrote:
> > Bigger buckets mean a wider possibility of response times. Some buckets
> > may contain 140k records and some 100X more.
>
> Have you analyzed the depth of the btree indexes to see how many more
> pages need to be read to handle finding a row in 140k records vs. 14M
> records vs. 140M records?
>
> I suspect you'd find that the change in actual depth (meaning how many
> pages have to actually be read to find the row you're looking for) isn't
> very much and that your concern over the "wider possibility of response
> times" isn't well founded.
>
>
Stephen,
Excellent feedback! Um, how does one look at tree depth in PostgreSQL?
Oracle I know but have not done the same in PG. Pointers?

> Since you have a hard-set 30ms maximum for query response time, I would
> suggest you work out how long it takes to read a cold page from your I/O
> subsystem and then you can work through exactly how many page reads
> could be done in that 30ms (or perhaps 20ms, to allow for whatever
> overhead there will be in the rest of the system and as a buffer) and
> then work that back to how deep the index can be based on that many page
> reads and then how many records are required to create an index of that
> depth. Of course, the page from the heap will also need to be read and
> there's a bit of additional work to be done, but the disk i/o for cold
> pages is almost certainly where most time will be spent.
>
> I suspect you'll discover that millions of tables is a couple orders of
> magnitude off of how many you'd need to keep the number of page reads
> below the threshold you work out based on your I/O.
>
> Of course, you would need a consistent I/O subsystem, or at least one
> where you know the maximum possible latency to pull a cold page.
>
> Lastly, you'll want to figure out how to handle system crash/restart if
> this system requires a high uptime. I expect you'd want to have at
> least one replica and a setup which allows you to flip traffic to it
> very quickly to maintain the 30ms response times.
>

I'm replicating via messaging. PG replication is fine for smaller db's but
I don't trust networks and PG upgrade intricacies complicate matters.

-Greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2016-09-28 18:00:35 Re: Millions of tables
Previous Message Stephen Frost 2016-09-28 17:27:59 Re: Millions of tables