Re: Horizontal scalability/sharding

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mason S <masonlists(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-01 17:00:06
Message-ID: CA+TgmoZ+HDbrUOWQRcH-jog2QXGqoJ=kmo263Ga3nWJGMZ9gEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2015 at 7:01 AM, Mason S <masonlists(at)gmail(dot)com> wrote:
> For efficient internodes joins with row shipping, FDWs may also not be easy
> to do. Maybe it is possible if we optionally pass in lists of other nodes
> and information about how they are partitioned so data knows where to get
> shipped.
>
> A challenge for planning with arbitrary copies of different shards is that
> sometimes you may be able to push down joins, sometimes not. Planning and
> execution get ugly. Maybe this can be simplified by parent-child tables
> following the same partitioning scheme.

This gets at a problem which Ozgun also mentioned in his Google
document, and which I also discussed with Etsuro Fujita at PGCon: good
query planning requires good metadata, and we don't really have that
today.

I think that a big part of the goal of the declarative partitioning
work that Amit Langote has recently been undertaking is to provide a
catalog representation of the partitioning structure that is easy to
work with, as opposed to just having a bunch of CHECK constraints that
you have to try to reason about. That's one part of the solution.

You also might need to know more about the remote table than is
captured by the column and data type list. Most particularly, you
might want to know what indexes exist on the remote side, but
currently, to figure out that out, you'd need to send queries to
retrieve that information to the remote side every time you do
planning, or maybe you could contrive a session-lifespan cache.
That's pretty annoying.

One idea for solving this problem is to allow CREATE INDEX on foreign
tables, but I don't like that much. There's no guarantee that the
remote side is a PostgreSQL instance, and if it isn't, the relevant
details about the indexes that exist may not be convenient to
represent in our catalogs. Heck, that can be true even if it is a
PostgreSQL instance, if the remote side relies on an AM or a function
that doesn't exist locally. But even if both sides are PostgreSQL
instances using only btree indexes on raw columns, now you've put the
burden on the DBA to make sure that the index definitions on the local
and remote sides match, and that's a pain in the neck.

What seems better to me is to allow ANALYZE of a foreign table a place
to record an arbitrary blob of metadata about the remote side that it
can then get access to during planning. Then, you can record details
about indexes, or statistics that don't fit into the mold of
pg_statistic, or really, anything else you're going to need to figure
out the best plan, and if the DBA changes the configuration on the
remote side, they don't need to update the local configuration to
match; a re-ANALYZE will do the trick.

I'm open to other ideas as well. Repartitioning will also arise for
an all-local parallel join, and it would be nice if the planner smarts
could be shared between that case and the remote-table case.
Therefore, while I suspect that some of the logic here will end up
inside one or more FDWs, I'm pretty confident that a significant chunk
of it needs to go into the core optimizer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-09-01 17:02:04 Re: On-demand running query plans using auto_explain and signals
Previous Message Robert Haas 2015-09-01 16:40:40 Re: Horizontal scalability/sharding