On 5/30/23 22:25, Lian Jiang wrote:
> hi,
>
> I am using psql to periodically dump the postgres tables into json
> files which are imported into snowflake. For large tables (e.g. 70M
> rows), it takes hours for psql to complete. Using spark to read the
> postgres table seems not to work as the postgres read only replication
> is the bottleneck so spark cluster never uses >1 worker node and the
> working node timeout or out of memory.
>
> Will vertical scaling the postgres db speed up psql? Or any thread
> related parameter of psql can help? Thanks for any hints.
>
> Regards
> Lian
Have you looked into COPY command? Or CopyManager java class?