Quick Links

Re: Columnar format export in Postgres

From:	Sutou Kouhei <kou(at)clear-code(dot)com>
To:	sushrut(dot)shivaswamy(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Columnar format export in Postgres
Date:	2024-06-15 21:32:20
Message-ID:	20240616.063220.999225191405879719.kou@clear-code.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=Q7VvOZ20uhG7Vxg33ZBLDg(at)mail(dot)gmail(dot)com>
"Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530,
Sushrut Shivaswamy <sushrut(dot)shivaswamy(at)gmail(dot)com> wrote:

> - To facilitate efficient querying it would help to export multiple
> parquet files for the table instead of a single file.
> Having multiple files allows queries to skip chunks if the key range in
> the chunk does not match query filter criteria.
> Even within a chunk it would help to be able to configure the size of a
> row group.
> - I'm not sure how these parameters will be exposed within `COPY TO`.
> Or maybe the extension implementing the `COPY TO` handler will
> allow this configuration?

Yes. But adding support for custom COPY TO options is
out-of-scope in the first version. We will focus on only the
minimal features in the first version. We can improve it
later based on use-cases.

> - Regarding using file_fdw to read Apache Arrow and Apache Parquet file
> because file_fdw is based on COPY FROM:
> - I'm not too clear on this. file_fdw seems to allow creating a table
> from data on disk exported using COPY TO.

Correct.

> But is the newly created table still using the data on disk(maybe in
> columnar format or csv) or is it just reading that data to create a row
> based table.

The former.

> I'm not aware of any capability in the postgres planner to read
> columnar files currently without using an extension like parquet_fdw.

Correct. We still need another approach such as parquet_fdw
with the COPY format extensible feature to optimize query
against Apache Parquet data. file_fdw can just read Apache
Parquet data by SELECT. Sorry for confusing you.

Thanks,
--
kou

In response to

Re: Columnar format export in Postgres at 2024-06-13 17:00:24 from Sushrut Shivaswamy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Sabino Mullane	2024-06-15 21:52:32	Re: RFC: adding pytest as a supported test framework
Previous Message	David E. Wheeler	2024-06-15 20:28:57	Re: Shouldn't jsonpath .string() Unwrap?