Re: processing large amount of rows with plpgsql

From: Geert Mak <pobox(at)verysmall(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: processing large amount of rows with plpgsql
Date: 2012-08-09 05:57:25
Message-ID: 156BB7A4-1F36-4023-A885-77BDB2AF2699@verysmall.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 08.08.2012, at 22:04, Merlin Moncure wrote:

> What is the general structure of the procedure? In particular, how
> are you browsing and updating the rows?

Here it is -

BEGIN
for statistics_row in SELECT * FROM statistics ORDER BY time ASC
LOOP
...
... here some very minimal transformation is done
... and the row is written into the second table
...
END LOOP;
RETURN 1;
END;

> There is (almost) no way to
> force commit inside a function --

So what you are saying is that this behavior is normal and we should either equip ourselves with enough disk space (which I am trying now, it is a cloud server, which I am resizing to gain more disk space and see what will happen) or do it with an external (scripting) language?

> there has been some discussion about
> stored procedure and/or autonomous transaction feature in terms of
> getting there.
>
> I say 'almost' because you can emulate some aspects of autonomous
> transactions with dblink, but that may not be a very good fit for your
> particular case.

I met already dblink mention in this context somewhere... Though if plpgsql performs well with more disk space, I'll leave it for now. It is a one time operation this one.

Thank you,
Geert
>From pgsql-general-owner(at)postgresql(dot)org Thu Aug 9 04:12:20 2012
Received: from makus.postgresql.org (makus.postgresql.org [98.129.198.125])
by mail.postgresql.org (Postfix) with ESMTP id 548FA1983A6F
for <pgsql-general(at)postgresql(dot)org>; Thu, 9 Aug 2012 04:12:13 -0300 (ADT)
Received: from mailgate.intershop.de ([195.110.60.6])
by makus.postgresql.org with esmtp (Exim 4.72)
(envelope-from <M(dot)Mamin(at)intershop(dot)de>)
id 1SzMui-00047M-HX
for pgsql-general(at)postgresql(dot)org; Thu, 09 Aug 2012 07:12:13 +0000
Received: from JENMAIL01.ad.intershop.net ([10.0.87.43])
by mailgate.intershop.de (8.14.5/8.13.6) with ESMTP id q797B80K014770;
Thu, 9 Aug 2012 09:11:10 +0200
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: Re: processing large amount of rows with plpgsql
Date: Thu, 9 Aug 2012 09:11:55 +0200
Message-ID: <C4DAC901169B624F933534A26ED7DF310861B65D(at)JENMAIL01(dot)ad(dot)intershop(dot)net>
In-Reply-To: <156BB7A4-1F36-4023-A885-77BDB2AF2699(at)verysmall(dot)org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [GENERAL] processing large amount of rows with plpgsql
Thread-Index: Ac118/yUyiCI96QPTBCiliBEhi2TcwACdSgQ
References: <E8B87AF3-D112-43A3-9EC7-155FD377BFB7(at)verysmall(dot)org> <CAHyXU0yr7n0nFB4bieYX7C9+v6CPwWmZ=1u1xQ2Pej0h6WGHeg(at)mail(dot)gmail(dot)com> <156BB7A4-1F36-4023-A885-77BDB2AF2699(at)verysmall(dot)org>
From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Geert Mak" <pobox(at)verysmall(dot)org>, <pgsql-general(at)postgresql(dot)org>
X-Pg-Spam-Score: -2.6 (--)
X-Archive-Number: 201208/182
X-Sequence-Number: 189359

> > There is (almost) no way to
> > force commit inside a function --
>
> So what you are saying is that this behavior is normal and we should
> either equip ourselves with enough disk space (which I am trying now,
> it is a cloud server, which I am resizing to gain more disk space and
> see what will happen) or do it with an external (scripting) language?
>

Hello,

a relative simple way to workaround your performance/resource problem is
to slice the update.

e.g.:

create function myupdate(slice int) ...

for statistics_row in
SELECT * FROM statistics
WHERE id % 16 = slice
or:
WHERE hashtext(id::text) % 16 = slice
...

and then call your function with the values 1..15 (when using 16 slices)

Use a power of 2 for the number of slices.

It may be faster to use many slices and
this allows to do the job in parallel on a few threads.

HTH,

Marc Mamin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Mamin 2012-08-09 07:19:26 Re: processing large amount of rows with plpgsql
Previous Message Tom Lane 2012-08-09 00:58:52 Re: Problem running "ALTER TABLE...", ALTER TABLE waiting