Re: Horizontal scalability/sharding

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mason S <masonlists(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-02 04:37:00
Message-ID: CAA4eK1KcFJKWy73nhOVbe99dC4s1Mk_2Mxm0P4BozaaYXkAR0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2015 at 4:25 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>
> The document opens a big question --- when queries can't be processed in
> a traditional top/down fashion, Citus has the goal of sending groups of
> results up the the coordinator, reordering them, then sending them back
> to the shards for further processing, basically using the shards as
> compute engines because the shards are no longer using local data to do
> their computations. The two examples they give are COUNT(DISTINCT) and
> a join across two sharded tables ("CANADA").
>
> I assumed these queries were going to be solved by sending as digested
> data as possible to the coordinator, and having the coordinator complete
> any remaining processing. I think we are going to need to decide if
> such "sending data back to shards" is something we are ever going to
> implement. I can see FDWs _not_ working well for that use-case.
>

Here one related point to think is how do we envision to handle statement
requests, do we want to have centeralized coordinator which will process
all requests or the requests could be received by any node?
I think both kind of systems have their own pros and cons like if we want
to have centralized coordinator kind of system, then it might be limited
by the number of simultaneous requests it can handle and if go other way
like allow requests to be processed by each individual nodes, then we
have to think about replicating all meta-data on all nodes.

I think Collecting statistics about different objects is another thing which
can differ depending on the strategy we choose to allow requests.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-09-02 04:43:33 Re: security labels on databases are bad for dump & restore
Previous Message Amit Kapila 2015-09-02 04:28:00 Re: Horizontal scalability/sharding