Quick Links

Re: COPY TO STDOUT Apache Arrow support

From:	Adam Lippai <adam(at)rigo(dot)sk>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: COPY TO STDOUT Apache Arrow support
Date:	2023-05-03 03:14:44
Message-ID:	CAGrfaBW6TywikPCo11==rhRR0qWXuWUfikNOH+8hUPd-uvs+jg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

There is also a new Arrow C library (one .h and one .c file) which makes it
easier to use it from the postgresql codebase.

https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/
https://github.com/apache/arrow-nanoarrow/tree/main/dist

Best regards,
Adam Lippai

On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai <adam(at)rigo(dot)sk> wrote:

> Hi,
>
> There are two bigger developments in this topic:
>
> 1. Pandas 2.0 is released and it can use Apache Arrow as a backend
> 2. Apache Arrow ADBC is released which standardizes the client API.
> Currently it uses the postgresql wire protocol underneath
>
> Best regards,
> Adam Lippai
>
> On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai <adam(at)rigo(dot)sk> wrote:
>
>> Hi,
>>
>> would it be possible to add Apache Arrow streaming format to the copy
>> backend + frontend?
>> The use case is fetching (or storing) tens or hundreds of millions of
>> rows for client side data science purposes (Pandas, Apache Arrow compute
>> kernels, Parquet conversion etc). It looks like the serialization overhead
>> when using the postgresql wire format can be significant.
>>
>> Best regards,
>> Adam Lippai
>>
>

In response to

Re: COPY TO STDOUT Apache Arrow support at 2023-04-13 18:35:48 from Adam Lippai

Responses

Re: COPY TO STDOUT Apache Arrow support at 2023-05-03 04:01:27 from Pavel Stehule

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2023-05-03 04:01:27	Re: COPY TO STDOUT Apache Arrow support
Previous Message	Peter Geoghegan	2023-05-03 03:04:01	Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound