Re: speed up full table scan using psql

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Lian Jiang <jiangok2006(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: speed up full table scan using psql
Date: 2023-05-31 05:16:53
Message-ID: CAApHDvp7p_DL4+Gqc+MqKNUvG6E1LrKJ6koD5s=QTjbeTSKqsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 31 May 2023 at 16:26, Lian Jiang <jiangok2006(at)gmail(dot)com> wrote:
> I am using psql to periodically dump the postgres tables into json files which are imported into snowflake. For large tables (e.g. 70M rows), it takes hours for psql to complete. Using spark to read the postgres table seems not to work as the postgres read only replication is the bottleneck so spark cluster never uses >1 worker node and the working node timeout or out of memory.
>
> Will vertical scaling the postgres db speed up psql? Or any thread related parameter of psql can help? Thanks for any hints.

This is probably slow due to psql's alignment. It needs to read the
entire result to know how much to whitespace to pad columns with
before it can output anything. Naturally, that requires quite a bit of
memory when large tables are queried and also lots of length checking.

As mentioned by Rob, you'll probably want to use COPY, or you could
look at using pg_dump. Something like:

pg_dump --inserts --rows-per-insert=100 --table=<name of your table to
export> --data-only <database name>

should give you something that you might have some luck directly
importing to Snowflake without any transformation tool required.

However, if you do need to do some transformation before loading, then
you might want to do that in PostgreSQL. For that, something like:

psql -c "copy (select col1,col2,col3+col4 from your_table) to stdout"
<name of database>

would allow you to run a query, which you could maybe do your
transformations in before importing into Snowflake

David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lian Jiang 2023-05-31 05:52:16 Re: speed up full table scan using psql
Previous Message Rob Sargent 2023-05-31 04:36:40 Re: speed up full table scan using psql