Re: plpgsql functions organisation

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Melvin Davidson <melvin6925(at)gmail(dot)com>, Alban Hertroys <haramrae(at)gmail(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Yves Dorfsman <yves(at)zioup(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: plpgsql functions organisation
Date: 2015-05-03 15:17:52
Message-ID: 55463C20.3080408@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/03/2015 07:14 AM, Melvin Davidson wrote:
> The point was to show that yes, function calls take time, and using sub
> functions take even more time. I am not about to write an additional
> more detailed example just to show the same results. If you are in
> doubt, I respectfully suggest you do your own testing.

Can't resist a challenge. I took an existing function that calculates an
aggregated attendance count for a student or all enrolled students over
a date period and modified it to call sub functions. There are two sub
functions, one that finds the students enrolled over a period(which by
the way calls another function) and dates range they where enrolled. The
other calculates the aggregate values for each student. The original
function is student_attendance, the modified student_attendance_sub. The
results are below, where the first argument is the student_id(where 0
equals all students). The all students version returns 600 rows, the
single student 16 rows.

hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.204865s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.014101s
hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.041182s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.011385s
hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.040762s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.016506s

hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.00291s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.004125s
hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.001907s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.003476s
hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.00597s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE: Time 00:00:00.003986s

Definite difference in the all students run, probably because one of the
called functions is used in a LOOP and caching applies.

>
> On Sun, May 3, 2015 at 5:26 AM, Alban Hertroys <haramrae(at)gmail(dot)com
> <mailto:haramrae(at)gmail(dot)com>> wrote:
>
>
> > On 03 May 2015, at 2:56, Melvin Davidson <melvin6925(at)gmail(dot)com <mailto:melvin6925(at)gmail(dot)com>> wrote:
> >
> > OK, Here is a simple example that shows the difference between using a self contained function and
> > one that calls sub functions.
> >
> > After loading all the functions below, repeat each of the EXPLAIN statements a few times and note that
> > callsubs takes almost TWICE as long to execute as nosub.
> >
> > CREATE OR REPLACE FUNCTION nosub(text)
> > RETURNS void AS
> > $BODY$
> ...
> > IF LENGTH(p_in_str) <= 6
> > THEN RAISE NOTICE 'Hi %', p_in_str;
> > ELSE
> > RAISE NOTICE 'Hello %', p_in_str;
> > END IF;
> >
> > RETURN;
> > END;
> > $BODY$
> …
>
>
> > CREATE OR REPLACE FUNCTION called1(text)
> > RETURNS void AS
> > $BODY$
> ...
> > RAISE NOTICE 'Hi %', p_in_str1;
> >
> > RETURN;
> > END;
> > $BODY$
> …
>
> > CREATE OR REPLACE FUNCTION called2(text)
> > RETURNS void AS
> > $BODY$
> ...
> > RAISE NOTICE 'Hello %', p_in_str2;
> >
> > RETURN;
> > END;
> ...
>
>
> That's a rather uninteresting experiment, as all it does is call a
> function and raise a notice. Relative to what the functions do, the
> function call itself takes a significant amount of time. No surprise
> there, you'll see something similar in any language, even C. All
> you're showing is that calling a function takes some amount of time > 0.
>
> In C, a function call needs to look up an address to jump to, in
> plpgsql the database needs to look up the function body in a table.
> If the function is small and atomic it often gets called from
> multiple other functions and is probably cached anyway. The main
> difference between C and plpgsql here is that the latter is an
> interpreted language, so it does need to read in the entire function
> body after a call - which I'd expect to be quite a bit faster with a
> smaller (atomic) function body, especially when it hasn't been
> cached yet.
>
> So far I haven't been convinced.
>
> An actual use-case where the functions actually do something would
> be far more interesting. I doubt anybody writes functions just to
> raise a notice. I expect that in reality most plpgsql functions
> perform database queries and do something with the result. In such
> cases, function call overhead could be significant if the call is
> done for each record in a result set, for example. And even then
> it's worth considering whether that matters to your situation enough
> that it outweighs the usual benefits of code separation.
>
>
> > On Sat, May 2, 2015 at 7:37 PM, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> > On 05/02/2015 03:28 PM, Bill Moran wrote:
> > On Sat, 02 May 2015 15:06:24 -0700
> > Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 05/02/2015 02:07 PM, Jeff Janes wrote:
> > On Sat, May 2, 2015 at 1:05 PM, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> wrote:
> >
> > On 05/02/2015 10:12 AM, Melvin Davidson wrote:
> >
> > AFAIK, you cannot "package" functions in PostgreSQL,
> but it is
> > possible to
> > call a function from within a function.
> >
> > That being said, I would seriously look at how and why
> you are
> > writing
> > your functions
> > as functions that call other functions are not very
> efficient.
> >
> >
> > I am not following. That is what packaging is about,
> separating out
> > 'units of work' so they can be combined as needed. Part of
> that is
> > using existing functions in new functions/classes. In fact
> in the
> > Postgres source I see this in many places. Now it is entirely
> > possible I missed a memo, so I am open to a more detailed
> > explanation of the inefficiencies involved.
> >
> >
> > The Postgres source is written in C, not in plpgsql. C has a good
> > optimizing compiler and plpgsql doesn't.
> >
> > Does this actually matter? I am a biologist that backed into
> computing,
> > so I realize I am weak on the fundamentals. Still the scientist in me
> > wants data backing assertions. As I understand it plpgsql works
> close to
> > the server and is optimized to do so. I know writing in C would be a
> > better solution. Still is calling plpgsql functions inside plpgsql
> > really a bad thing when just considering plpgsql?
> >
> > The answer to that is the same answer to so many other things: it
> depends.
> >
> > plpgsql functions are slower than C. They also lack a lot of language
> > features that C has. That being said, if they're meeting your
> needs, then
> > don't worry about it. plpgsql is around because for most people,
> it works
> > well enough. There are certainly cases when you want to create
> very complex
> > logic in the database and plpgsql is liable to make that
> difficult. But
> > there are a lot of cases where having to manage pointers and a build
> > environment and all the things that go with C aren't justified,
> because
> > plpgsql has none of that complexity. There are advantages both ways.
> >
> > The beauty of PostgreSQL is that you have both available and you
> > can choose whichever is best for your situation.
> >
> > Agreed, though in my case I drop into plpythonu when I want more
> complex solutions.
> >
> >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
> >
> >
> > --
> > Melvin Davidson
> > I reserve the right to fantasize. Whether or not you
> > wish to share my fantasy is entirely up to you.
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize. Whether or not you
> wish to share my fantasy is entirely up to you.

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Anton 2015-05-03 21:15:54 optimization join on random value
Previous Message Guillaume Lelarge 2015-05-03 14:59:52 Re: schedulers