From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: announce: spark-postgres 3 released |
Date: | 2019-11-11 00:56:58 |
Message-ID: | fe7e6d7b-00f8-3de6-8eec-231932277179@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 11/10/19 4:05 PM, Nicolas Paris wrote:
> Hello postgres users,
Interesting. FYI, the announcement list is:
https://www.postgresql.org/list/pgsql-announce/
>
> Spark-postgres is designed for reliable and performant ETL in big-data
> workload and offers read/write/scd capability to better bridge spark and
> postgres. The version 3 introduces a datasource API. It outperforms
> sqoop by factor 8 and the apache spark core jdbc by infinity.
>
> Features:
> - use of pg COPY statements
> - parallel reads/writes
> - use of hdfs to store intermediary csv
> - reindex after bulk-loading
> - SCD1 computations done on the spark side
> - use unlogged tables when needed
> - handle arrays and multiline string columns
> - useful jdbc functions (ddl, updates...)
>
> The official repository:
> https://framagit.org/parisni/spark-etl/tree/master/spark-postgres
>
> And its mirror on microsoft github:
> https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Daulat Ram | 2019-11-11 10:08:00 | RE: Postgres Point in time Recovery (PITR), |
Previous Message | Nicolas Paris | 2019-11-11 00:16:49 | Re: How to import Apache parquet files? |