From: | Lian Jiang <jiangok2006(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: speed up full table scan using psql |
Date: | 2023-05-31 05:52:16 |
Message-ID: | CA+aY8X4JGQVev8JWB8DK2ubznM9yy0XuFJ1FmZkb_RL6vLHYdw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks! Will try.
Is there a way to speed up Spark ingestion (postgres -> spark -> snowflake)
which does not have issues such as alignment or inefficient json file
format? I favor spark since it simplifies the ingest path of postgres ->
local json.gz -> s3 -> snowflake stage -> snowflake table. Hope this makes
sense.
On Tue, May 30, 2023 at 10:17 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Wed, 31 May 2023 at 16:26, Lian Jiang <jiangok2006(at)gmail(dot)com> wrote:
> > I am using psql to periodically dump the postgres tables into json files
> which are imported into snowflake. For large tables (e.g. 70M rows), it
> takes hours for psql to complete. Using spark to read the postgres table
> seems not to work as the postgres read only replication is the bottleneck
> so spark cluster never uses >1 worker node and the working node timeout or
> out of memory.
> >
> > Will vertical scaling the postgres db speed up psql? Or any thread
> related parameter of psql can help? Thanks for any hints.
>
> This is probably slow due to psql's alignment. It needs to read the
> entire result to know how much to whitespace to pad columns with
> before it can output anything. Naturally, that requires quite a bit of
> memory when large tables are queried and also lots of length checking.
>
> As mentioned by Rob, you'll probably want to use COPY, or you could
> look at using pg_dump. Something like:
>
> pg_dump --inserts --rows-per-insert=100 --table=<name of your table to
> export> --data-only <database name>
>
> should give you something that you might have some luck directly
> importing to Snowflake without any transformation tool required.
>
> However, if you do need to do some transformation before loading, then
> you might want to do that in PostgreSQL. For that, something like:
>
> psql -c "copy (select col1,col2,col3+col4 from your_table) to stdout"
> <name of database>
>
> would allow you to run a query, which you could maybe do your
> transformations in before importing into Snowflake
>
> David
>
--
Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
From | Date | Subject | |
---|---|---|---|
Next Message | Wen Yi | 2023-05-31 07:17:14 | Is there a bug in psql? (SELECT ''';) |
Previous Message | David Rowley | 2023-05-31 05:16:53 | Re: speed up full table scan using psql |