Re: Disk buffering of resultsets

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: "Lussier, Denis" <denisl(at)openscg(dot)com>
Cc: Enrico Olivelli - Diennea <enrico(dot)olivelli(at)diennea(dot)com>, Vitalii Tymchyshyn <vit(at)tym(dot)im>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John R Pierce <pierce(at)hogranch(dot)com>, PG-JDBC Mailing List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Disk buffering of resultsets
Date: 2014-10-15 02:30:22
Message-ID: CADK3HH+Cyg3c+LWMtidUbGK+DHk1H850EtKuDSLXpnqNNYcaqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Actually doing a 9.4 driver isn't a huge deal. What protocol specifics are
you referring to ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 14 October 2014 21:22, Lussier, Denis <denisl(at)openscg(dot)com> wrote:

> I don't think i've heard any talk of using features that are 9.4
> server/protocol specific. You'll of course need the updated jdbc driver
> and (I'm taking an educated guess here) that this is too much to check in
> for the 9.4 JDBC driver release. Perhaps it could be an experimental
> feature that could be optionally compiled in during early days of the
> iterative design, develop/test, tweak cycle before stability is reached.
>
> On Tue, Oct 14, 2014 at 4:09 AM, Enrico Olivelli - Diennea <
> enrico(dot)olivelli(at)diennea(dot)com> wrote:
>
>> Hi,
>>
>> we can give support doing some benchs with our platform as soon as some
>> ALFA/BETA will be available.
>>
>> We are longing for this series of improvements!
>>
>> I hope that these features could be used with 9.3 server and that we
>> won’t need to upgrade to 9.4 (which is not still stable )
>>
>>
>>
>> Thank you very much
>>
>>
>>
>> *Enrico Olivelli*
>> Software Development Manager @Diennea
>> Tel.: (+39) 0546 066100 - Int. 925
>> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>>
>> MagNews - E-mail Marketing Solutions
>> http://www.magnews.it
>> Diennea - Digital Marketing Solutions
>> http://www.diennea.com
>>
>>
>>
>>
>>
>> *Da:* pgsql-jdbc-owner(at)postgresql(dot)org [mailto:
>> pgsql-jdbc-owner(at)postgresql(dot)org] *Per conto di *Vitalii Tymchyshyn
>> *Inviato:* lunedì 13 ottobre 2014 17:34
>> *A:* Craig Ringer
>> *Cc:* Tom Lane; John R Pierce; PG-JDBC Mailing List
>> *Oggetto:* Re: [JDBC] Disk buffering of resultsets
>>
>>
>>
>> Hello, again.
>>
>>
>>
>> Sorry for the pause, I had a really busy week. Yet it allowed me to think
>> a little more.
>>
>> As for me, there are three independent goals that can be addressed
>> independently:
>>
>>
>>
>> 1) Prevent OOMs
>>
>> Unfortunately this can be addressed with out of heap saving only. The way
>> I did in draft would still OOM when secondary query comes.
>>
>> Note that it's not that unusual. It's usually used without any
>> multithreading to perform a client-side join, e.g. when complicated
>> inheritance scenario is in place or to load some dictionary data without
>> much duplication (e.g. only few wide dictionary entries for the long
>> query), ...
>>
>> I am still thinking to do it without much parsing (you will need record
>> type and size and that's all, without field parsing) by simply copying
>> as-is to temp file. Pluggable interfaces can be done later if needed.
>>
>>
>>
>> 2) Fast first record
>>
>> Here we need to introduce strategies for "who is doing copying and when"
>> from (1). I propose pluggable strategies with few predefined (see below).
>> User can pass predefined strategy name or an Executor as a DataSource
>> parameter or static method reference that returns an Executor when a string
>> is needed (e.g. in connection URI). This would also allow to easily point
>> to Executors.* methods. We may think about ScheduledExecutor requirement to
>> also reuse it for QueryTimeout stuff.
>>
>>
>>
>> I propose to have next predefined strategies:
>>
>> a) Direct executor, that does all loading at the very beginning,
>> potentially saving to a temp file.
>>
>> b) Postponed executor, that works much like in my draft: reads as needed
>> without any disk saving. Performs disk saving only when connection is
>> needed for some other statement.
>>
>> c) JVM-wide Executors.newCachedThreadPool that will start offloading in
>> parallel as fetchSize is reached.
>>
>>
>>
>> Also I'd propose to set default fetchSize to some reasonable value, like
>> 1000 and specify one of the strategies (e.g (a)) as default so that we
>> won't get OOM on default settings. Or we should allow to set default fetch
>> size on connection/data source level (or both).
>>
>>
>>
>> 3) Fast cancel/resultset close.
>>
>> It's the only place where switching to portals is needed as far as I can
>> see and it can be done orthogonal to (1) and (2). I don't see any other
>> goal that will benefit from it. To be honest, I am willing to do (1) and
>> (2), but not (3) because this would mean me to get much deeper into the
>> protocol I know almost nothing about right now.
>>
>>
>>
>> Best regards, Vitalii Tymchyshyn.
>>
>> ------------------------------
>> Rimani aggiornato sul mondo dell’email marketing e del digital marketing:
>> visita il nostro blog! http://blog.magnews.it
>>
>
>

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2014-10-21 02:19:48 Re: Disk buffering of resultsets
Previous Message Lussier, Denis 2014-10-15 01:22:44 Re: Disk buffering of resultsets