Re: Disk buffering of resultsets

From: Vitalii Tymchyshyn <vit(at)tym(dot)im>
To: Enrico Olivelli - Diennea <enrico(dot)olivelli(at)diennea(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John R Pierce <pierce(at)hogranch(dot)com>, PG-JDBC Mailing List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Disk buffering of resultsets
Date: 2014-10-21 02:19:48
Message-ID: CABWW-d06uTMsLcLx7YHbxZFu8L=ssJSed3pD0j0rMF3JnzBtTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hello, all.

Basically, alpha of p.1 is available here:
https://github.com/tivv/pgjdbc/tree/offloading
It passes all the tests of cursor-based implementation, but I want to add
some more (mostly with multiple open statements).

Best regards, Vitalii Tymchyshyn

2014-10-14 4:09 GMT-04:00 Enrico Olivelli - Diennea <
enrico(dot)olivelli(at)diennea(dot)com>:

> Hi,
>
> we can give support doing some benchs with our platform as soon as some
> ALFA/BETA will be available.
>
> We are longing for this series of improvements!
>
> I hope that these features could be used with 9.3 server and that we won’t
> need to upgrade to 9.4 (which is not still stable )
>
>
>
> Thank you very much
>
>
>
> *Enrico Olivelli*
> Software Development Manager @Diennea
> Tel.: (+39) 0546 066100 - Int. 925
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>
> MagNews - E-mail Marketing Solutions
> http://www.magnews.it
> Diennea - Digital Marketing Solutions
> http://www.diennea.com
>
>
>
>
>
> *Da:* pgsql-jdbc-owner(at)postgresql(dot)org [mailto:
> pgsql-jdbc-owner(at)postgresql(dot)org] *Per conto di *Vitalii Tymchyshyn
> *Inviato:* lunedì 13 ottobre 2014 17:34
> *A:* Craig Ringer
> *Cc:* Tom Lane; John R Pierce; PG-JDBC Mailing List
> *Oggetto:* Re: [JDBC] Disk buffering of resultsets
>
>
>
> Hello, again.
>
>
>
> Sorry for the pause, I had a really busy week. Yet it allowed me to think
> a little more.
>
> As for me, there are three independent goals that can be addressed
> independently:
>
>
>
> 1) Prevent OOMs
>
> Unfortunately this can be addressed with out of heap saving only. The way
> I did in draft would still OOM when secondary query comes.
>
> Note that it's not that unusual. It's usually used without any
> multithreading to perform a client-side join, e.g. when complicated
> inheritance scenario is in place or to load some dictionary data without
> much duplication (e.g. only few wide dictionary entries for the long
> query), ...
>
> I am still thinking to do it without much parsing (you will need record
> type and size and that's all, without field parsing) by simply copying
> as-is to temp file. Pluggable interfaces can be done later if needed.
>
>
>
> 2) Fast first record
>
> Here we need to introduce strategies for "who is doing copying and when"
> from (1). I propose pluggable strategies with few predefined (see below).
> User can pass predefined strategy name or an Executor as a DataSource
> parameter or static method reference that returns an Executor when a string
> is needed (e.g. in connection URI). This would also allow to easily point
> to Executors.* methods. We may think about ScheduledExecutor requirement to
> also reuse it for QueryTimeout stuff.
>
>
>
> I propose to have next predefined strategies:
>
> a) Direct executor, that does all loading at the very beginning,
> potentially saving to a temp file.
>
> b) Postponed executor, that works much like in my draft: reads as needed
> without any disk saving. Performs disk saving only when connection is
> needed for some other statement.
>
> c) JVM-wide Executors.newCachedThreadPool that will start offloading in
> parallel as fetchSize is reached.
>
>
>
> Also I'd propose to set default fetchSize to some reasonable value, like
> 1000 and specify one of the strategies (e.g (a)) as default so that we
> won't get OOM on default settings. Or we should allow to set default fetch
> size on connection/data source level (or both).
>
>
>
> 3) Fast cancel/resultset close.
>
> It's the only place where switching to portals is needed as far as I can
> see and it can be done orthogonal to (1) and (2). I don't see any other
> goal that will benefit from it. To be honest, I am willing to do (1) and
> (2), but not (3) because this would mean me to get much deeper into the
> protocol I know almost nothing about right now.
>
>
>
> Best regards, Vitalii Tymchyshyn.
>
> ------------------------------
> Rimani aggiornato sul mondo dell’email marketing e del digital marketing:
> visita il nostro blog! http://blog.magnews.it
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Enrico Olivelli - Diennea 2014-10-21 10:47:43 R: Disk buffering of resultsets
Previous Message Dave Cramer 2014-10-15 02:30:22 Re: Disk buffering of resultsets