From: | Sim Zacks <sim(at)compulab(dot)co(dot)il> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: background triggers? |
Date: | 2006-05-24 11:36:18 |
Message-ID: | e51d37$kpe$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
My understanding of Listen/Notify is that it is a completely
disconnected process running on the database server.
It should be run as a dameon (or service in Windows) application. The
trigger on the table calls the notify SQL command and then the trigger,
and thereby the statement, ends.
The notify then tells this daemon application to wake up and start
processing.
To make sure that a second process does not start while the first
process is running, you should have a running processes table which gets
inserted when it starts and updated when it ends. That way your process
can check if one is currently running or not.
Another way of doing it is to have a cron job check every X minutes for
records in the table. When they are there, it should run the process. It
can also have a flag that says don't run another process until this one
is finished.
I may not have understood exactly what you are trying to do, but from
what I understood, this will solve your problem.
Sim
Rafal Pietrak wrote:
> On Tue, 2006-05-23 at 13:01 -0400, Kenneth Downs wrote:
>> Rafal Pietrak wrote:
>>> some other INSERT, return imediately if so, but turn into background for
>>> a long-lasting job if not".
>>>
>>>
>>>
>> Rafal, I'm wondering why you want to do this. You may be "fighting the
>> framework".
>
> Yes, most probably. I'm afraid of that :(
>
> Still...
>
>> If you are trying to do something that is totally unsupported, it is
>> probably for a pretty good reason, usually dealing with security or data
>> loss. You can probably get what you want by supported methods, but it
>> may require looking at the problem in a different way.
>>
>> What is it you are trying to accomplish? Is it just performance?
>
>
> OK. here is a 'real life' example. It works more like a post office. Now
> and then, there arrive a 'delivery man' with a bunch of post to deliver.
> Post office takes the batch, checks and stamps each and every item, and
> hands over a receipt. But the actual bin-ing (into delivery channels)
> and routing (dispatch) is handled without the delivery man standing and
> waiting for the process to end.
>
> In my *real*life* case I have a file with hundreds of tousends of
> 'transactions' uploaded by operator (several times times a day, and in
> fact, by 10-30 operators) - those are checked at INSERT time. But after
> that check and INSERT, I'd like to say to the operator: "OK, jour job is
> done, don't warry about the rest".
>
> But there is more work to do with the batch.
>
> I have to 'route the transactions' to their relevant accounts, and see
> how those change the 'status' of those accounts, consequently, store the
> updated status within the account itself. This is tedious and time
> consuming. But it have to be done, because 'customers' query account
> status for those 'agregate status information' and it would be *very*
> haevy for the database if those queries required browsing of the entire
> 'transaction log'. Number of 'Transactions' to number of accounts is
> like milions to thousends.
>
> A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
> machine, ATA disks). When I attach trigger (*Very* simple funciton) to
> update the accounts, the INSERT take hours (2-4). But when I make just
> one single update of all accounts at the end of the batch insert, it
> takes 20-30min.
>
> So my solution was 1) to have an additional table "TABLE dirty_log(tiem
> timestamp, who text)", which gets inserted a row *after* a batch of
> INSERTS. 2) and a 'server side process', run every 5-10 minutes, which
> makes accounts update, and which: A) does NOT launche when another such
> process is currently running; B) purges DIRTY_LOG table after it's done.
> This is quite obvoisly a 'user space implementation' of the 'background
> triggers'. I needed that.
>
> Natuaraly, having this 'bacground trigger' I loose acuracy of the
> account information. But I gain on system efficiency - this is
> engineering decision. When we have to take those (like the
> implementation above), it's good to have 'system tools' (like
> 'background triggers') that support us.
>
> But of cource I may be wrong all togather. I'd really like to know the
> techniq, which is 'along the line' of RDBM systems design, which serves
> that same purpose.
>
> Is there a better solution?
>
> NB: the 'batch INSERT' I mentioned above is done by www server. It's
> quite vital to have the the server process terminated (meaning: not keep
> it waiting for LISTEN/NOPTIFY event - meaning not use LISTEN/NOTIFY), as
> apache will keep the connection opened until the process ends.
>
> In 'real life', this scenario is applicable also to 'service network':
> 1. Say, you have 100_000 - 1000_000 vending machines (VM) network.
> 2. each is is loaded with c.a. 100 item types (300 types in the entire
> network).
> 3. each VM dispatches an item every 2-3 minutes. which make overall
> 'transaction traffic' at the level of over hundreds per second.
> 4. assume, that for 'customer management', you need to store quite a bit
> of data with each item-dispense 'transaction'. Meaning: transaction are
> not very light, and their details have to be kept for long time.
> 5. obviously, you need to manage your stock (each of the 300
> item-types): you keep VM loaded and keep some stock at central store.
>
> (ATMs are a good example of such netowrk)
>
> So:
> 1. 'transaction traffic' is so signifficant, that you really have to
> 'optimise for that'
> 2. you don't really have to know *exactly* when you run out of stock,
> because each VM has signifficant local item store, so if you get
> notified, that a particular VM gets close to the bottom with particular
> item, you may dispatch a reload in 10min, but it's also OK to dispatch
> that in 2hours - meaning, the 'acocunt information' does not have to be
> 'immediately acurate'. Far more important is 'dispatch transaction'
> performance.
> 3. normally, you 'keep an eye' on you VM network - meaning, you issue a
> 'statistics' query quite frequently. If that was a 'haevy query' it
> would degrade your database performance quite signifficantly - we really
> need the 'agregate information' stored within 'item-accounts'.
>
> Is there a clean, 'along the framework' design that serves this reality?
>
> -R
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kenneth Downs | 2006-05-24 11:41:26 | Re: background triggers? |
Previous Message | Rafal Pietrak | 2006-05-24 11:31:32 | Re: background triggers? |