Re: problem with lost connection while running long PL/R query

From: "David M(dot) Kaplan" <david(dot)kaplan(at)ird(dot)fr>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ryan Kelly <rpkelly22(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: problem with lost connection while running long PL/R query
Date: 2013-05-17 08:18:23
Message-ID: 5195E7CF.3020208@ird.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Joe,

Thanks for responding as you would clearly be the expert on this sort of
problem. My current function does page through data using a cursor
precisely to avoid out of memory problems, which is why I am somewhat
surprised and stumped as to how this can be happening. It does return
all the data at once, but one call to the function would seem to work,
so I can't see why 4 wouldn't. I am currently planning to do some test
runs using memory.profile() to see if each successive call to the PL/R
function is somehow accumulating memory usage somewhere. Perhaps I am
not properly closing a query or something like that?

I am attaching my code. Perhaps you will have some insight. To give
you a basic idea of what I am trying to do, I have separately developed
a classification model for the state of a "system" based on the data in
the postgresql table. I want to apply that model to each line of the
table. I loop over the cursor and predict the state for batches of
10,000 lines at a time.

Thanks again for the help.

Cheers,
David

On 05/16/2013 11:40 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/16/2013 08:40 AM, Tom Lane wrote:
>> "David M. Kaplan" <david(dot)kaplan(at)ird(dot)fr> writes:
>>> Thanks for the help. You have definitely identified the problem,
>>> but I am still looking for a solution that works for me. I tried
>>> setting vm.overcommit_memory=2, but this just made the query
>>> crash quicker than before, though without killing the entire
>>> connection to the database. I imagine that this means that I
>>> really am trying to use more memory than the system can handle?
>>> I am wondering if there is a way to tell postgresql to flush a
>>> set of table lines out to disk so that the memory they are using
>>> can be liberated.
>> Assuming you don't have work_mem set to something unreasonably
>> large, it seems likely that the excessive memory consumption is
>> inside your PL/R function, and not the fault of Postgres per se.
>> You might try asking in some R-related forums about how to reduce
>> the code's memory usage.
> The two "classic" approaches to this with PL/R are either create a
> custom aggregate with the PL/R as the final function (i.e. work on one
> group at a time) or use the SPI cursor functionality within the PL/R
> function and page your data using a cursor. Not all forms of analysis
> lend themselves to these approaches, but perhaps yours does.
>
> Ultimately I would like to implement a form of R data.frame that does
> the paging with a cursor transparently for you, but I have not been
> able to find the time so far.
>
> Joe
>
>
> - --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJRlVJdAAoJEDfy90M199hle8gP+wU+GSJ44g26VBBAy3po/E/Y
> 9+pwxBhJe0x6v5PXtuM8Bzyy4yjlKCgzDj4XdZpEU7SYR+IKj7tWCihqc+Fuk1t1
> EjR2VUJwpSMztRvEIqWW8rX/DFGaVYCt89n0neKfKL/XJ5rbqMqQAUPbxMaBtW/p
> 7EXo8RjVBMYibkvKrjpYJjLTuOTWkQCiXx5hc4HVFN53DYOF46rdFxMYUe5KLYTL
> mZOnSoV0yrsaPGnxRIY0uzRv7ZTTBmB2o4TIWpTySx2rHNLqAJIT22wl0pfkjksH
> JYvko3rWhSg7vSf+8RDN6X1eMAXcUO7H2NR5IdOoXEX2bzqTmDBQUjOcb5WR1yUd
> L5XuT5WYiTpyzU8qAtPEVirwFnEwUN1tR6wDoVsseIWwXUYqSuXtg9qjFNAXZ1Hr
> 05yxuzexOEzLQNwSXWhsCrLdnndEHrJ6pDlLaUCPVybxwwwW9BfS2fJUz+X63M8x
> l5DYbyl6q6o2J2bs4UGCTk4r/1Qq/R9pApkWzsckTtF6zl49mzwzPnh5b/JcB+4x
> u17Te+s3cRGcX09lt7qf9cWkv1uUF/Qw0ntBhW8TY2HYhbWVIEmiZV1HIksXf+nw
> EBFshWs2/H75OPnhN9YNq3tjCuiR7o/eaZeINfGs2LzGIJvHpcjMDBgFFTES7CYV
> Y20XukH07h9XcJGTsf0o
> =TwfD
> -----END PGP SIGNATURE-----

--
**********************************
David M. Kaplan
Charge de Recherche 1

Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95

http://www.umr-eme.org/team/dkaplan/
http://www.amped.ird.fr/
**********************************

Attachment Content-Type Size
functions_model_prediction.sql text/x-sql 2.5 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message sumita 2013-05-17 08:56:31 Re: FATAL: database "a/system_data" does not exist
Previous Message John R Pierce 2013-05-17 07:33:32 Re: [HACKERS] PLJava for Postgres 9.2.