Re: csv_populate_recordset and csv_agg

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Steve Chavez <steve(at)supabase(dot)io>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: csv_populate_recordset and csv_agg
Date: 2022-10-24 02:51:00
Message-ID: 3661504.1666579860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Steve Chavez <steve(at)supabase(dot)io> writes:
> CSV processing is also a common use case and PostgreSQL has the COPY ..
> FROM .. CSV form but COPY is not compatible with libpq pipeline mode and
> the interface is clunkier to use.

> I propose to include two new functions:

> - csv_populate_recordset ( base anyelement, from_csv text )
> - csv_agg ( anyelement )

The trouble with CSV is there are so many mildly-incompatible
versions of it. I'm okay with supporting it in COPY, where
we have the freedom to add random sub-options (QUOTE, ESCAPE,
FORCE_QUOTE, yadda yadda) to cope with those variants.
I don't see a nice way to handle that issue in the functions
you propose --- you'd have to assume that there is One True CSV,
which sadly ain't so, or else complicate the functions beyond
usability.

Also, in the end CSV is a surface presentation layer, and as
such it's not terribly well suited as the calculation representation
for aggregates and other functions. I think these proposed functions
would have pretty terrible performance as a consequence of the
need to constantly re-parse the surface format. The same point
could be made about JSON ... which is why we prefer to implement
processing functions with JSONB.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2022-10-24 03:29:54 Re: doubt about FullTransactionIdAdvance()
Previous Message Bharath Rupireddy 2022-10-24 02:45:11 Re: pg_recvlogical prints bogus error when interrupted