From: | Sushrut Shivaswamy <sushrut(dot)shivaswamy(at)gmail(dot)com> |
---|---|
To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Columnar format export in Postgres |
Date: | 2024-06-13 17:00:24 |
Message-ID: | CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the response.
I had considered using COPY TO to export columnar data but gave up on it
since the formats weren't extensible.
It's great to see that you are making it extensible.
I'm still going through the thread of comments on your patch but I have
some early thoughts about using it for columnar data export.
- To maintain data freshness there would need to be a way to schedule
exports using `COPY TO 'parquet`` periodically
- pg_analytica has the scheduling logic, once available COPY TO can
be used to export the data instead of reading table in chunks being used
currently.
- To facilitate efficient querying it would help to export multiple
parquet files for the table instead of a single file.
Having multiple files allows queries to skip chunks if the key range in
the chunk does not match query filter criteria.
Even within a chunk it would help to be able to configure the size of a
row group.
- I'm not sure how these parameters will be exposed within `COPY TO`.
Or maybe the extension implementing the `COPY TO` handler will
allow this configuration?
- Regarding using file_fdw to read Apache Arrow and Apache Parquet file
because file_fdw is based on COPY FROM:
- I'm not too clear on this. file_fdw seems to allow creating a table
from data on disk exported using COPY TO.
But is the newly created table still using the data on disk(maybe in
columnar format or csv) or is it just reading that data to create a row
based table.
I'm not aware of any capability in the postgres planner to read
columnar files currently without using an extension like parquet_fdw.
- For your usecase how do you plan to query the arrow / parquet
data?
From | Date | Subject | |
---|---|---|---|
Next Message | Sushrut Shivaswamy | 2024-06-13 17:01:43 | Re: Columnar format export in Postgres |
Previous Message | Bertrand Drouvot | 2024-06-13 16:52:09 | Re: Avoid orphaned objects dependencies, take 3 |