Re: Disk buffering of resultsets

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Vitalii Tymchyshyn <vit(at)tym(dot)im>
Cc: Enrico Olivelli - Diennea <enrico(dot)olivelli(at)diennea(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John R Pierce <pierce(at)hogranch(dot)com>, PG-JDBC Mailing List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Disk buffering of resultsets
Date: 2014-10-21 11:06:42
Message-ID: CADK3HH+offVAUeAq5ROw3_Kg2p9BXT1zEEiNMLgKV_MX2WkVYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi Vitalii,

Thanks for this. I am headed for pgconf, hopefully I will have time to look
at it on the plane.

Dave

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 20 October 2014 22:19, Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:

> Hello, all.
>
> Basically, alpha of p.1 is available here:
> https://github.com/tivv/pgjdbc/tree/offloading
> It passes all the tests of cursor-based implementation, but I want to add
> some more (mostly with multiple open statements).
>
> Best regards, Vitalii Tymchyshyn
>
> 2014-10-14 4:09 GMT-04:00 Enrico Olivelli - Diennea <
> enrico(dot)olivelli(at)diennea(dot)com>:
>
> Hi,
>>
>> we can give support doing some benchs with our platform as soon as some
>> ALFA/BETA will be available.
>>
>> We are longing for this series of improvements!
>>
>> I hope that these features could be used with 9.3 server and that we
>> won’t need to upgrade to 9.4 (which is not still stable )
>>
>>
>>
>> Thank you very much
>>
>>
>>
>> *Enrico Olivelli*
>> Software Development Manager @Diennea
>> Tel.: (+39) 0546 066100 - Int. 925
>> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>>
>> MagNews - E-mail Marketing Solutions
>> http://www.magnews.it
>> Diennea - Digital Marketing Solutions
>> http://www.diennea.com
>>
>>
>>
>>
>>
>> *Da:* pgsql-jdbc-owner(at)postgresql(dot)org [mailto:
>> pgsql-jdbc-owner(at)postgresql(dot)org] *Per conto di *Vitalii Tymchyshyn
>> *Inviato:* lunedì 13 ottobre 2014 17:34
>> *A:* Craig Ringer
>> *Cc:* Tom Lane; John R Pierce; PG-JDBC Mailing List
>> *Oggetto:* Re: [JDBC] Disk buffering of resultsets
>>
>>
>>
>> Hello, again.
>>
>>
>>
>> Sorry for the pause, I had a really busy week. Yet it allowed me to think
>> a little more.
>>
>> As for me, there are three independent goals that can be addressed
>> independently:
>>
>>
>>
>> 1) Prevent OOMs
>>
>> Unfortunately this can be addressed with out of heap saving only. The way
>> I did in draft would still OOM when secondary query comes.
>>
>> Note that it's not that unusual. It's usually used without any
>> multithreading to perform a client-side join, e.g. when complicated
>> inheritance scenario is in place or to load some dictionary data without
>> much duplication (e.g. only few wide dictionary entries for the long
>> query), ...
>>
>> I am still thinking to do it without much parsing (you will need record
>> type and size and that's all, without field parsing) by simply copying
>> as-is to temp file. Pluggable interfaces can be done later if needed.
>>
>>
>>
>> 2) Fast first record
>>
>> Here we need to introduce strategies for "who is doing copying and when"
>> from (1). I propose pluggable strategies with few predefined (see below).
>> User can pass predefined strategy name or an Executor as a DataSource
>> parameter or static method reference that returns an Executor when a string
>> is needed (e.g. in connection URI). This would also allow to easily point
>> to Executors.* methods. We may think about ScheduledExecutor requirement to
>> also reuse it for QueryTimeout stuff.
>>
>>
>>
>> I propose to have next predefined strategies:
>>
>> a) Direct executor, that does all loading at the very beginning,
>> potentially saving to a temp file.
>>
>> b) Postponed executor, that works much like in my draft: reads as needed
>> without any disk saving. Performs disk saving only when connection is
>> needed for some other statement.
>>
>> c) JVM-wide Executors.newCachedThreadPool that will start offloading in
>> parallel as fetchSize is reached.
>>
>>
>>
>> Also I'd propose to set default fetchSize to some reasonable value, like
>> 1000 and specify one of the strategies (e.g (a)) as default so that we
>> won't get OOM on default settings. Or we should allow to set default fetch
>> size on connection/data source level (or both).
>>
>>
>>
>> 3) Fast cancel/resultset close.
>>
>> It's the only place where switching to portals is needed as far as I can
>> see and it can be done orthogonal to (1) and (2). I don't see any other
>> goal that will benefit from it. To be honest, I am willing to do (1) and
>> (2), but not (3) because this would mean me to get much deeper into the
>> protocol I know almost nothing about right now.
>>
>>
>>
>> Best regards, Vitalii Tymchyshyn.
>>
>> ------------------------------
>> Rimani aggiornato sul mondo dell’email marketing e del digital marketing:
>> visita il nostro blog! http://blog.magnews.it
>>
>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2014-10-21 11:15:14 Re: Adding support for batches that return generated keys
Previous Message Enrico Olivelli - Diennea 2014-10-21 10:47:43 R: Disk buffering of resultsets