Re: Scheduler in Postgres

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Scheduler in Postgres
Date: 2004-12-18 02:46:34
Message-ID: m3y8fw1fxx.fsf@knuth.knuth.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In the last exciting episode, guyr(at)masergy(dot)com ("Guy Rouillier") wrote:
> Here is a real world example where a scheduler in PostgreSQL would be
> helpful. We collect usage statistics from our network throughout the
> day (raw stats.) After midnight, we roll up those raw stats into daily
> statistics.

> We have a very large amount of data, about 2 million rows a day a
> growing, so I want this whole operation done on the database server.
> It's all database work, just summing up data from one table and
> putting the result in another table. We have all that logic in a
> stored procedure. So why do I need to set up a cron job and a shell
> script whose only task is to connect to the database and start up
> the stored procedure? Wouldn't it be much simpler just to have a
> schedule in PostgreSQL that says "at 12:01, run this stored
> procedure"?

But that means Yet Another Scheduling User Interface to learn. I'd
really rather prefer to improve cron.

It's not as if DBMS connections are all that daunting to establish,
after all. Even on a "paranoid" system where we use MD5 passwords
everywhere, I can make the challenges go away using .pgpass

If you have a lot of jobs like this, it makes sense to me to use some
common "scheduling" layer that doesn't tell you "Oh, that's not
database stuff, so you'll have to learn some other scheduler!"

> Another advantage to having a scheduler in the database is to ease
> your DBA's job in maintenance, and to coordinate work by multiple
> systems.

I'd consider that a disadvantage, personally, as forcing it to be
usable solely for DBMS tasks means that you have to learn (and pick
between) multiple schedulers in order to manage things outside the
DBMS.

We run Slony-I, for instance. We might want to schedule activity for
it using a scheduler, but your "captive interface inside the DBMS"
won't work for that because slon daemons and slonik control scripts
run as processes that reside _outside_ the DBMS.

Similarly, I hate the idea of having multiple PIM "calendar" systems;
if tasks are getting scheduled in 4 places for me, personally, that
means I need to do a barrel of in-my-head context switching in order
to monitor each of those four places, and to try to keep them
consistent. I seem to recall someone doing a talk making that very
point at OSCON.
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/sap.html
"...you might as well skip the Xmas celebration completely, and
instead sit in front of your linux computer playing with the
all-new-and-improved linux kernel version." -- Linus Torvalds

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lonni J Friedman 2004-12-18 02:53:42 Re: replacements for vacuum?
Previous Message Timothy Perrigo 2004-12-18 00:43:08 Re: OSX 10.3.7 broke Postgresql 8.0.0b5?