| From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> | 
|---|---|
| To: | Lian Jiang <jiangok2006(at)gmail(dot)com> | 
| Cc: | pgsql-general(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: speed up full table scan using psql | 
| Date: | 2023-05-31 21:43:09 | 
| Message-ID: | d6b0d93a-e0fe-c92a-bec1-1f4de5627952@aklaver.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 5/31/23 13:57, Lian Jiang wrote:
> The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) 
> from public.mytable x_tmp_uniq) to stdout"
> postgres version:  14.7
> Does this mean COPY and java CopyManager may not help since my psql 
> command already uses copy?
I don't think the issue is COPY itself but row_to_json(x_tmp_uniq).
This:
https://towardsdatascience.com/spark-essentials-how-to-read-and-write-data-with-pyspark-5c45e29227cd
indicates Spark can use CSV as an input source.
Given that I would just COPY the data out as CSV.
> 
> Regarding pg_dump, it does not support json format which means extra 
> work is needed to convert the supported format to jsonl (or parquet) so 
> that they can be imported into snowflake. Still exploring but want to 
> call it out early. Maybe 'custom' format can be parquet?
> 
> 
> Thanks
> Lian
-- 
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Adrian Klaver | 2023-05-31 21:50:10 | Re: speed up full table scan using psql | 
| Previous Message | Lian Jiang | 2023-05-31 20:57:39 | Re: speed up full table scan using psql |