Re: Out of memory error on huge resultset

From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Dror Matalon <dror(at)zapatec(dot)com>
Cc: pgsql-jdbc <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Out of memory error on huge resultset
Date: 2002-10-11 17:59:41
Message-ID: 1034359185.3461.114.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-jdbc

Agreed, but there are selects where count(*) won't work. Even so, what
we are talking about here is hiding the implementation of cursors behind
the result set. What I would envision is some sort of cacheing where
when the user set's the fetchsize to 10 for instance we do the select,
and when they ask for next() we check to see if we have these rows in
the cache, and go get them if necessary 10 at a time, possibly keeping
one set of ten behind where we are and one set of 10 ahead of where we
are. So recalling that resultSets have absolute positioning, as well as
first(), and last() positioning we need the ability to move with the
minimum number of trips to the backend.

As it turns out the move command in postgres does support moving to the
end (move 0 ); at the moment this is considered a bug, and is on the
todo list to be removed. I expect we can get some sort of implementation
which allows us to move to the end ( move end )

Dave

On Fri, 2002-10-11 at 13:12, Dror Matalon wrote:
>
> Hi,
>
> I'm jumping in late into this discussion but ...
>
> In my mind a lot of these features break the model. From an application
> prespective, if I want to do last, I do a count(*) and then I do a fetch
> with limit; Not quite the same, but all these methods of fetching the
> whole data locally and manipulating it to a large exten defeat the
> purpose. Let the backend do the work, instead of trying to replicate the
> functionality in JDBC.
>
> That said I do understand that some of these are required by the JDBC 2.0
> spec.
>
> Dror
>
> On Fri, Oct 11, 2002 at 01:05:37PM -0400, Aaron Mulder wrote:
> > It wouldn't be bad to start with a naive implementation of
> > last()... If the only problem we have is that last() doesn't perform
> > well, we're probably making good progress. :)
> > On the other hand, I would think the updateable result sets would
> > be the most challenging; does the server provide any analogous features
> > with its cursors?
> >
> > Aaron
> >
> > On 11 Oct 2002, Dave Cramer wrote:
> > > This really is an artifact of the way that postgres gives us the data.
> > >
> > > When you query the backend you get *all* of the results in the query,
> > > and there is no indication of how many results you are going to get. In
> > > simple selects it would be possible to get some idea by using
> > > count(field), but this wouldn't work nearly enough times to make it
> > > useful. So that leaves us with using cursors, which still won't tell you
> > > how many rows you are getting back, but at least you won't have the
> > > memory problems.
> > >
> > > This approach is far from trivial which is why it hasn't been
> > > implemented as of yet, keep in mind that result sets support things like
> > > move(n), first(), last(), the last of which will be the trickiest. Not
> > > to mention updateable result sets.
> > >
> > > As it turns out there is a mechanism to get to the end move 0 in
> > > 'cursor', which currently is being considered a bug.
> > >
> > > Dave
> > >
> > > On Fri, 2002-10-11 at 11:44, Doug Fields wrote:
> > > > At 08:27 AM 10/11/2002, snpe wrote:
> > > > >Barry,
> > > > > Is it true ?
> > > > >I create table with one column varchar(500) and enter 1 milion rows with
> > > > >length 10-20 character.JDBC query 'select * from a' get error 'out of
> > > > >memory', but psql not.
> > > > >I insert 8 milion rows and psql work fine yet (slow, but work)
> > > >
> > > > The way the code works in JDBC is, in my opinion, a little poor but
> > > > possibly mandated by JDBC design specs.
> > > >
> > > > It reads the entire result set from the database backend and caches it in a
> > > > horrible Vector (which should really be a List and which should at least
> > > > make an attempt to get the # of rows ahead of time to avoid all the
> > > > resizing problems).
> > > >
> > > > Then, it doles it out from memory as you go through the ResultSet with the
> > > > next() method.
> > > >
> > > > I would have hoped (but was wrong) that it streamed - WITHOUT LOADING THE
> > > > WHOLE THING - through the result set as each row is returned from the
> > > > backend, thus ensuring that you never use much more memory than one line.
> > > > EVEN IF you have to keep the connection locked.
> > > >
> > > > The latter is what I expected it to do. The former is what it does. So, it
> > > > necessitates you creating EVERY SELECT query which you think has more than
> > > > a few rows (or which you think COULD have more than a few rows, "few" being
> > > > defined by our VM memory limits) into a cursor based query. Really klugy. I
> > > > intend to write a class to do that for every SELECT query for me automatically.
> > > >
> > > > Cheers,
> > > >
> > > > Doug
> > > >
> > > >
> > > > >In C library is 'execute query' without fetch - in jdbc execute fetch all
> > > > >rows
> > > > >and this is problem - I think that executequery must prepare query and fetch
> > > > >(ResultSet.next or ...) must fetch only fetchSize rows.
> > > > >I am not sure, but I think that is problem with jdbc, not postgresql
> > > > >Hackers ?
> > > > >Does psql fetch all rows and if not how many ?
> > > > >Can I change fetch size in psql ?
> > > > >CURSOR , FETCH and MOVE isn't solution.
> > > > >If I use jdbc in third-party IDE, I can't force this solution
> > > > >
> > > > >regards
> > > > >
> > > > >On Thursday 10 October 2002 06:40 pm, Barry Lind wrote:
> > > > > > Nick,
> > > > > >
> > > > > > This has been discussed before on this list many times. But the short
> > > > > > answer is that that is how the postgres server handles queries. If you
> > > > > > issue a query the server will return the entire result. (try the same
> > > > > > query in psql and you will have the same problem). To work around this
> > > > > > you can use explicit cursors (see the DECLARE CURSOR, FETCH, and MOVE
> > > > > > sql commands for postgres).
> > > > > >
> > > > > > thanks,
> > > > > > --Barry
> > > > > >
> > > > > > Nick Fankhauser wrote:
> > > > > > > I'm selecting a huge ResultSet from our database- about one million rows,
> > > > > > > with one of the fields being varchar(500). I get an out of memory error
> > > > > > > from java.
> > > > > > >
> > > > > > > If the whole ResultSet gets stashed in memory, this isn't really
> > > > > > > surprising, but I'm wondering why this happens (if it does), rather than
> > > > > > > a subset around the current record being cached and other rows being
> > > > > > > retrieved as needed.
> > > > > > >
> > > > > > > If it turns out that there are good reasons for it to all be in memory,
> > > > > > > then my question is whether there is a better approach that people
> > > > > > > typically use in this situation. For now, I'm simply breaking up the
> > > > > > > select into smaller chunks, but that approach won't be satisfactory in
> > > > > > > the long run.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > -Nick
> > > > > > >
> > > > > > > -------------------------------------------------------------------------
> > > > > > >- Nick Fankhauser nickf(at)ontko(dot)com Phone 1.765.935.4283 Fax
> > > > > > > 1.765.962.9788 Ray Ontko & Co. Software Consulting Services
> > > > > > > http://www.ontko.com/
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------(end of broadcast)---------------------------
> > > > > > > TIP 5: Have you checked our extensive FAQ?
> > > > > > >
> > > > > > > http://www.postgresql.org/users-lounge/docs/faq.html
> > > > > >
> > > > > > ---------------------------(end of broadcast)---------------------------
> > > > > > TIP 6: Have you searched our list archives?
> > > > > >
> > > > > > http://archives.postgresql.org
> > > > >
> > > > >
> > > > >---------------------------(end of broadcast)---------------------------
> > > > >TIP 2: you can get off all lists at once with the unregister command
> > > > > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> > > >
> > > >
> > > >
> > > > ---------------------------(end of broadcast)---------------------------
> > > > TIP 6: Have you searched our list archives?
> > > >
> > > > http://archives.postgresql.org
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> > >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
> --
> Dror Matalon
> Zapatec Inc
> 1700 MLK Way
> Berkeley, CA 94709
> http://www.zapatec.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dave Cramer 2002-10-11 18:13:05 Re: Out of memory error on huge resultset
Previous Message Hector Galicia 2002-10-11 17:26:19 error in execution

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2002-10-11 18:13:05 Re: Out of memory error on huge resultset
Previous Message darren 2002-10-11 17:51:33 Re: Peer to peer replication of Postgresql databases

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2002-10-11 18:13:05 Re: Out of memory error on huge resultset
Previous Message Robert M. Zigweid 2002-10-11 17:48:05 Re: Problem with setNull()