From: | Nicolas Paris <nicolas(dot)paris(at)riseup(dot)net> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: How to import Apache parquet files? |
Date: | 2019-11-11 00:16:49 |
Message-ID: | 20191111001649.cpvzp7f4qgzzjxgo@riseup.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> I would like to import (lots of) Apache parquet files to a PostgreSQL 11
you might be intersted in spark-postgres library. Basically the library
allows you to bulk load parquet files in one spark command:
> spark
> .read.format("parquet")
> .load(parquetFilesPath) // read the parquet files
> .write.format("postgres")
> .option("host","yourHost")
> .option("partitions", 4) // 4 threads
> .option("table","theTable")
> .option("user","theUser")
> .option("database","thePgDatabase")
> .option("schema","thePgSchema")
> .loada // bulk load into postgres
more details at https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres
On Tue, Nov 05, 2019 at 03:56:26PM +0100, Softwarelimits wrote:
> Hi, I need to come and ask here, I did not find enough information so I hope I
> am just having a bad day or somebody is censoring my search results for fun...
> :)
>
> I would like to import (lots of) Apache parquet files to a PostgreSQL 11
> cluster - yes, I believe it should be done with the Python pyarrow module, but
> before digging into the possible traps I would like to ask here if there is
> some common, well understood and documented tool that may be helpful with that
> process?
>
> It seems that the COPY command can import binary data, but I am not able to
> allocate enough resources to understand how to implement a parquet file import
> with that.
>
> I really would like follow a person with much more knowledge than me about
> either PostgreSQL or Apache parquet format instead of inventing a bad wheel.
>
> Any hints very welcome,
> thank you very much for your attention!
> John
--
nicolas
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2019-11-11 00:56:58 | Re: announce: spark-postgres 3 released |
Previous Message | Nicolas Paris | 2019-11-11 00:05:36 | announce: spark-postgres 3 released |