Re: speed up full table scan using psql

From: Lian Jiang <jiangok2006(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: speed up full table scan using psql
Date: 2023-05-31 05:52:16
Message-ID: CA+aY8X4JGQVev8JWB8DK2ubznM9yy0XuFJ1FmZkb_RL6vLHYdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks! Will try.

Is there a way to speed up Spark ingestion (postgres -> spark -> snowflake)
which does not have issues such as alignment or inefficient json file
format? I favor spark since it simplifies the ingest path of postgres ->
local json.gz -> s3 -> snowflake stage -> snowflake table. Hope this makes
sense.

On Tue, May 30, 2023 at 10:17 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Wed, 31 May 2023 at 16:26, Lian Jiang <jiangok2006(at)gmail(dot)com> wrote:
> > I am using psql to periodically dump the postgres tables into json files
> which are imported into snowflake. For large tables (e.g. 70M rows), it
> takes hours for psql to complete. Using spark to read the postgres table
> seems not to work as the postgres read only replication is the bottleneck
> so spark cluster never uses >1 worker node and the working node timeout or
> out of memory.
> >
> > Will vertical scaling the postgres db speed up psql? Or any thread
> related parameter of psql can help? Thanks for any hints.
>
> This is probably slow due to psql's alignment. It needs to read the
> entire result to know how much to whitespace to pad columns with
> before it can output anything. Naturally, that requires quite a bit of
> memory when large tables are queried and also lots of length checking.
>
> As mentioned by Rob, you'll probably want to use COPY, or you could
> look at using pg_dump. Something like:
>
> pg_dump --inserts --rows-per-insert=100 --table=<name of your table to
> export> --data-only <database name>
>
> should give you something that you might have some luck directly
> importing to Snowflake without any transformation tool required.
>
> However, if you do need to do some transformation before loading, then
> you might want to do that in PostgreSQL. For that, something like:
>
> psql -c "copy (select col1,col2,col3+col4 from your_table) to stdout"
> <name of database>
>
> would allow you to run a query, which you could maybe do your
> transformations in before importing into Snowflake
>
> David
>

--

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Wen Yi 2023-05-31 07:17:14 Is there a bug in psql? (SELECT ''';)
Previous Message David Rowley 2023-05-31 05:16:53 Re: speed up full table scan using psql