Re: JDBC and processing large numbers of rows

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Guido Fiala <guido(dot)fiala(at)dka-gmbh(dot)de>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC and processing large numbers of rows
Date: 2004-05-12 11:36:41
Message-ID: 40A20C49.6090802@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Guido Fiala wrote:
> Reading all this i'd like to know if all this isn't just a tradeof between
> _where_ the memory is consumed?
>
> If your JDBC-client holds all in memory - it gets an OutOfMem-Exception.

Yes. The current driver attempts to hold the entire resultset in heap if
not using cursors. In theory, the driver could try to spill large
resultsets to disk (but see below).

> If your backend uses Cursors - it caches the whole resultset and probably
> starts swapping and gets slow (needs the memory of all users).

As I understand it (break out the salt!), the backend's cursor behaviour
depends on both the query and the cursor type. For a NO SCROLL cursor,
the backend is doing no more real work than for normal query retrieval
(it's just changing *when* the rows are retrieved). For a SCROLL cursor,
the backend may or may not need additional storage depending on the
complexity of the query. When the backend does need to store the
resultset, it will spill large resultsets to disk rather than keep them
in-memory (and possibly this storage is more efficient than just storing
the raw tuples, as it could just reference the original tuples in the
main table itself -- not sure if this is how it actually works though).
For a WITH HOLD cursor, in addition to the SCROLL/NOSCROLL behaviour the
backend will preserve a copy of the resultset when it can no longer
safely derive the results directly from the database (i.e. when the
enclosing transaction commits).

The DECLARE documentation discusses some of this.

> If you use Limit and Offset the database has to do more to find the
> data-snippet and in worst case (last few records) still needs temporary the
> whole resultset? (not sure here)

I'd guess that in the non-zero OFFSET case the backend skips and
discards the initial rows (why would it need to store them?), i.e. it's
similar in cost to doing a DECLARE/MOVE FORWARD/FETCH FORWARD sequence.
But you end up rerunning the start of the query every time you want some
more data..

> Is that just a "choose your poison" ? At least in the first case the memory of
> the Client _gets_ used too and not all load to the backend, on the other side
> - most the the user does not really read all the data at all, so it puts
> unnecessary load on all the hardware.

I lean towards using cursors. The backend can handle some cases more
efficiently than a client can, simply because it knows more about how to
generate the data and where it might already be stored. Also it seems a
bit redundant to do the same work to deal with large datasets (spill to
disk, etc) on both the client and the server.

-O

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Guido Fiala 2004-05-12 12:31:08 Re: JDBC and processing large numbers of rows
Previous Message Dave Cramer 2004-05-12 10:56:54 Re: JDBC and processing large numbers of rows