Re: Columnar format export in Postgres

From: Sushrut Shivaswamy <sushrut(dot)shivaswamy(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Columnar format export in Postgres
Date: 2024-06-13 17:00:24
Message-ID: CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the response.

I had considered using COPY TO to export columnar data but gave up on it
since the formats weren't extensible.
It's great to see that you are making it extensible.

I'm still going through the thread of comments on your patch but I have
some early thoughts about using it for columnar data export.

- To maintain data freshness there would need to be a way to schedule
exports using `COPY TO 'parquet`` periodically
- pg_analytica has the scheduling logic, once available COPY TO can
be used to export the data instead of reading table in chunks being used
currently.

- To facilitate efficient querying it would help to export multiple
parquet files for the table instead of a single file.
Having multiple files allows queries to skip chunks if the key range in
the chunk does not match query filter criteria.
Even within a chunk it would help to be able to configure the size of a
row group.
- I'm not sure how these parameters will be exposed within `COPY TO`.
Or maybe the extension implementing the `COPY TO` handler will
allow this configuration?

- Regarding using file_fdw to read Apache Arrow and Apache Parquet file
because file_fdw is based on COPY FROM:
- I'm not too clear on this. file_fdw seems to allow creating a table
from data on disk exported using COPY TO.
But is the newly created table still using the data on disk(maybe in
columnar format or csv) or is it just reading that data to create a row
based table.
I'm not aware of any capability in the postgres planner to read
columnar files currently without using an extension like parquet_fdw.
- For your usecase how do you plan to query the arrow / parquet
data?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sushrut Shivaswamy 2024-06-13 17:01:43 Re: Columnar format export in Postgres
Previous Message Bertrand Drouvot 2024-06-13 16:52:09 Re: Avoid orphaned objects dependencies, take 3