Re: Turbo ODBC

From: "Uwe L(dot) Korn" <uwelk(at)xhochy(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Wes McKinney <wesmckinn(at)gmail(dot)com>, Matthew Rocklin <mrocklin(at)continuum(dot)io>
Cc: psycopg(at)postgresql(dot)org, michael(dot)koenig(at)blue-yonder(dot)com
Subject: Re: Turbo ODBC
Date: 2017-01-17 15:18:41
Message-ID: 1484666321.1044625.850459792.6F63D407@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

In Arrow, we have a bitmap for each column indicating if a value is
NULL. We can convert this clearly to NumPy masked arrays but once this
data is converted to Pandas though, integer columns with NULLs will be
converted to floats with NaN representing NULL as there is no explicit
NULL representation in Pandas 0.x.

--
Uwe L. Korn
uwelk(at)xhochy(dot)com

On Tue, Jan 17, 2017, at 04:06 PM, Jim Nasby wrote:
> On 1/17/17 4:51 AM, Uwe L. Korn wrote:
> > One important thing for fast columnar data access is that you don't want
> > to have the data as Python objects before they will be turned into a
> > DataFrame. Besides much better buffering, this was one of the main
> > advantages we have with Turbodbc. Given that the ODBC drivers for
> > Postgres seem to be in a miserable state, it would be much preferable to
> > have such functionality directly in pyscopg2. Given from meetings with
> > people at some PyData conferences that I showed turbodbc to, I can
> > definitely say that there are some users out there that would like a
> > fast path for Postgres-to-Pandas.
> >
> > In turbodbc, there are two additional functions added to the DB-API
> > cursor object: fetchallnumpy and fetchallarrow. These suffice mostly for
> > the typical pandas workloads. The experience from implementing this is
> > basically that with Arrow it was quite simple to add a columnar
> > interface as most of the data conversions were handled by Arrow. Also
> > there was no need for me to interface with any Python types as the
> > language "barrier" was transparently handled by Arrow.
>
> I certainly see the advantages to not creating objects. How do you end
> up handling NULLs?
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)

In response to

Browse psycopg by date

  From Date Subject
Next Message Jim Nasby 2017-01-17 16:31:02 Re: Turbo ODBC
Previous Message Koenig, Michael 2017-01-17 15:16:59 Re: Turbo ODBC