Re: Slow concurrent processing

From: Misa Simic <misa(dot)simic(at)gmail(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow concurrent processing
Date: 2013-03-12 17:11:46
Message-ID: CAH3i69mP_W=1gUpty7Me8vWpRShUWffsD=wepTM=JQeFVCVYdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Steve,

Of course I thought under the limits... I haven't thought there are that
kind of problems(CPU/Memory/io) because of there are no degradation during
long running process - on other sides... i.e. some complex query - run
when long running process is off and run it when long runing process is
under the go - takes similar time etc... (and that query uses as well
tables involved in long runing do_the_math_ function - but of course dont
ask at all for potential rows what will long runining functin produce - but
would not get it anyway even asks...)

all the processing is under postgres... (but no updates at all - that would
point me directly to potential row_lock problem...)

To process one record - is again deeper sequential processing thing with
lot if/else etc...

Something like:

GetMasterInfo about RecordID (join several settings table related to input
RecordID)

If that RecordID is that type then
apply_callculation1(recordID)
else
apply_calculation2(recordID)
and so on...

then for exapmple apply_calculation1 says:

get all records for this recordID between related period... (From tracking
tables)
for each day... take status from that day... calculate hours what match
different time periods during the day, and use different rate for each -
but again that rate - in some cases depends on Total hours spent in the
week that day belongs for that record_id etc etc...
so basicaly insert in result_table1 - splited amounts by category for each
day applying different calculations for each category...
Then later sum things from result_table1 - insert_them in result_table2...
and do again further calculations based on info in resut_table2 and insert
results in the same table...

All that math for 1 thing - last 0.5 to 2secs - depending on lot of things
etc,,,

sleep(1) - was just simplified thing to spent required time for
processing... not to help about hardware limits and bandwith :)

just the fact we can run complex query during long processing function is
under run - said me there are no hardware resource problems...

Many thanks,

Misa

2013/3/12 Steve Crawford <scrawford(at)pinpointresearch(dot)com>

> On 03/12/2013 08:06 AM, Misa Simic wrote:
>
>> Thanks Steve
>>
>> Well, the full story is too complex - but point was - whatever blackbox
>> does - it last 0.5 to 2secs per 1 processed record (maybe I was wrong but I
>> thought the reason why it takes the time how much it needs to actually do
>> the task -CPU/IO/memory whatever is not that important....) - so I really
>> don't see difference between: call web service, insert row in the table
>> (takes 3 secs) and sleep 3 seconds - insert result in the table...
>>
>> if we do above task for two things sequential - it will last 6 secs...but
>> if we do it "concurentelly" - it should last 3 secs... (in theory :) )
>>
>
> Not at all - even in "theory." Sleep involves little, if any, contention
> for resources. Real processing does. So if a process requires 100% of
> available CPU then one process gets it all while many running
> simultaneously will have to share the available CPU resource and thus each
> will take longer to complete. Or, if you prefer, think of a file download.
> If it takes an hour to download a 1GB file it doesn't mean that you can
> download two 1GB files concurrently in one hour even if "simulating" the
> process by a sleep(3600) suggests it is possible.
>
> I should note, however, that depending on the resource that is limiting
> your speed there is often room for optimization through simultaneous
> processing - especially when processes are CPU bound. Since PostgreSQL
> associates each back-end with one CPU *core*, you can have a situation
> where one core is spinning and the others are more-or-less idle. In those
> cases you may see an improvement by increasing the number of simultaneous
> processes to somewhere shy of the number of cores.
>
>
>
>> I was guessed somewhere is lock - but wasn't clear where/why when there
>> are no updates - just inserts...
>>
>> But I haven't know that during INSERT is done row lock on refferenced
>> tables as well - from FK columns...
>>
>> So I guess now it is cause of the problem...
>>
>> We will see how it goes with insert into unlogged tables with no FK...
>>
>>
> It will almost certainly go faster as you have eliminated integrity and
> data-safety. This may be acceptable to you (non-real-time crunching of data
> that can be reloaded from external sources or temporary processing that is
> ultimately written back to durable storage) but it doesn't mean you have
> identified the actual cause.
>
> One thing you didn't state. Is all this processing taking place in
> PostgreSQL? (i.e. update foo set bar = do_the_math(baz, zap, boom)) where
> do_the_math is a PL/pgSQL, PL/Python, ... or are external processes
> involved?
>
> Cheers,
> Steve
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Misa Simic 2013-03-12 17:16:16 Re: Slow concurrent processing
Previous Message Jeff Janes 2013-03-12 17:09:21 Re: Slow concurrent processing