Re: Columnar format export in Postgres

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sushrut(dot)shivaswamy(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Columnar format export in Postgres
Date: 2024-06-15 21:32:20
Message-ID: 20240616.063220.999225191405879719.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg(at)mail(dot)gmail(dot)com>
"Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530,
Sushrut Shivaswamy <sushrut(dot)shivaswamy(at)gmail(dot)com> wrote:

> - To facilitate efficient querying it would help to export multiple
> parquet files for the table instead of a single file.
> Having multiple files allows queries to skip chunks if the key range in
> the chunk does not match query filter criteria.
> Even within a chunk it would help to be able to configure the size of a
> row group.
> - I'm not sure how these parameters will be exposed within `COPY TO`.
> Or maybe the extension implementing the `COPY TO` handler will
> allow this configuration?

Yes. But adding support for custom COPY TO options is
out-of-scope in the first version. We will focus on only the
minimal features in the first version. We can improve it
later based on use-cases.

See also: https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com

> - Regarding using file_fdw to read Apache Arrow and Apache Parquet file
> because file_fdw is based on COPY FROM:
> - I'm not too clear on this. file_fdw seems to allow creating a table
> from data on disk exported using COPY TO.

Correct.

> But is the newly created table still using the data on disk(maybe in
> columnar format or csv) or is it just reading that data to create a row
> based table.

The former.

> I'm not aware of any capability in the postgres planner to read
> columnar files currently without using an extension like parquet_fdw.

Correct. We still need another approach such as parquet_fdw
with the COPY format extensible feature to optimize query
against Apache Parquet data. file_fdw can just read Apache
Parquet data by SELECT. Sorry for confusing you.

Thanks,
--
kou

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-06-15 21:52:32 Re: RFC: adding pytest as a supported test framework
Previous Message David E. Wheeler 2024-06-15 20:28:57 Re: Shouldn't jsonpath .string() Unwrap?