We have just newly released PGSpider extension(pgspider_ext).
This is an extension to construct High-Performance SQL Cluster Engine for distributed big data.
PGSpider enables PostgreSQL to access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.
The main feature is:
* Node partitioned table
User can get records in multi tables on some data sources by one SQL easily.
If there are 2 data sources which have the following records:
SELECT * FROM t1_node1; -- @node1
i | t
----+---
10 | a
11 | b
(2 rows)
SELECT * FROM t1_node2; -- @node2
i | t
----+---
20 | c
21 | d
(2 rows)
PGSpider enables to collect these records with node identifier column like:
SELECT * FROM t1;
i | t | node
----+---+-------
10 | a | node1
11 | b | node1
20 | c | node2
21 | d | node2
(4 rows)
Parallel processing
PGSpider can fetch results from data sources in parallel.
Pushdown
PGSpider can pushdown WHERE clause and aggregation functions to data sources.
The shippability depends on datasource FDW.
This is developed by Toshiba Software Engineering & Technology Center.
Source repository : https://github.com/pgspider/pgspider_ext
Best Regards,
Mototaka Kanematsu