Re: Make COPY extendable in order to support Parquet and other formats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make COPY extendable in order to support Parquet and other formats
Date: 2022-06-22 23:49:08
Message-ID: 20220622234908.jkmc6qg352dsh5x5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote:
> On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev
> <aleksander(at)timescale(dot)com> wrote:
>
> >
> > In other words, personally I'm unaware of use cases when somebody
> > needs a complete read/write FDW or TableAM implementation for formats
> > like Parquet, ORC, etc. Also to my knowledge they are not particularly
> > optimized for this.
> >
>
> IIUC, you want extensibility in FORMAT argument to COPY command
> https://www.postgresql.org/docs/current/sql-copy.html. Where the
> format is pluggable. That seems useful.

Agreed.

But I think it needs quite a bit of care. Just plugging in a bunch of per-row
(or worse, per field) switches to COPYs input / output parsing will make the
code even harder to read and even slower.

I suspect that we'd first need a patch to refactor the existing copy code a
good bit to clean things up. After that it hopefully will be possible to plug
in a new format without being too intrusive.

I know little about parquet - can it support FROM STDIN efficiently?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-06-22 23:58:45 Re: pg_upgrade (12->14) fails on aggregate
Previous Message Jacob Champion 2022-06-22 23:36:00 Re: [PoC] Let libpq reject unexpected authentication requests