From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mason S <masonlists(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Horizontal scalability/sharding |
Date: | 2015-09-02 04:37:00 |
Message-ID: | CAA4eK1KcFJKWy73nhOVbe99dC4s1Mk_2Mxm0P4BozaaYXkAR0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 1, 2015 at 4:25 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>
> The document opens a big question --- when queries can't be processed in
> a traditional top/down fashion, Citus has the goal of sending groups of
> results up the the coordinator, reordering them, then sending them back
> to the shards for further processing, basically using the shards as
> compute engines because the shards are no longer using local data to do
> their computations. The two examples they give are COUNT(DISTINCT) and
> a join across two sharded tables ("CANADA").
>
> I assumed these queries were going to be solved by sending as digested
> data as possible to the coordinator, and having the coordinator complete
> any remaining processing. I think we are going to need to decide if
> such "sending data back to shards" is something we are ever going to
> implement. I can see FDWs _not_ working well for that use-case.
>
Here one related point to think is how do we envision to handle statement
requests, do we want to have centeralized coordinator which will process
all requests or the requests could be received by any node?
I think both kind of systems have their own pros and cons like if we want
to have centralized coordinator kind of system, then it might be limited
by the number of simultaneous requests it can handle and if go other way
like allow requests to be processed by each individual nodes, then we
have to think about replicating all meta-data on all nodes.
I think Collecting statistics about different objects is another thing which
can differ depending on the strategy we choose to allow requests.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2015-09-02 04:43:33 | Re: security labels on databases are bad for dump & restore |
Previous Message | Amit Kapila | 2015-09-02 04:28:00 | Re: Horizontal scalability/sharding |