From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Adam Lippai <adam(at)rigo(dot)sk> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: COPY TO STDOUT Apache Arrow support |
Date: | 2023-05-03 04:01:27 |
Message-ID: | CAFj8pRCyROOXvg10wxRRgdfU2GbS3L6-zUCw5pvgbdmYzNx_jQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
st 3. 5. 2023 v 5:15 odesílatel Adam Lippai <adam(at)rigo(dot)sk> napsal:
> Hi,
>
> There is also a new Arrow C library (one .h and one .c file) which makes
> it easier to use it from the postgresql codebase.
>
> https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/
> https://github.com/apache/arrow-nanoarrow/tree/main/dist
>
> Best regards,
> Adam Lippai
>
With 9fcdf2c787ac6da330165ea3cd50ec5155943a2b it can be implemented in
extension
Regards
Pavel
> On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai <adam(at)rigo(dot)sk> wrote:
>
>> Hi,
>>
>> There are two bigger developments in this topic:
>>
>> 1. Pandas 2.0 is released and it can use Apache Arrow as a backend
>> 2. Apache Arrow ADBC is released which standardizes the client API.
>> Currently it uses the postgresql wire protocol underneath
>>
>> Best regards,
>> Adam Lippai
>>
>> On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai <adam(at)rigo(dot)sk> wrote:
>>
>>> Hi,
>>>
>>> would it be possible to add Apache Arrow streaming format to the copy
>>> backend + frontend?
>>> The use case is fetching (or storing) tens or hundreds of millions of
>>> rows for client side data science purposes (Pandas, Apache Arrow compute
>>> kernels, Parquet conversion etc). It looks like the serialization overhead
>>> when using the postgresql wire format can be significant.
>>>
>>> Best regards,
>>> Adam Lippai
>>>
>>
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-05-03 05:21:14 | Re: Large files for relations |
Previous Message | Adam Lippai | 2023-05-03 03:14:44 | Re: COPY TO STDOUT Apache Arrow support |