ISD application limit

Dedicated to DataStage and DataStage TX editions featuring IBM<sup>®</sup> Service-Oriented Architectures.

Moderators: chulett, rschirm

Post Reply
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

ISD application limit

Post by qt_ky »

We are seeing regular warnings in the SystemOut.log about 1-2 times a week:

Code: Select all

00006595 ThreadPool    I   WSVR0652W: The size of thread pool "WebContainer" has reached 80 percent of its maximum.
00006561 ThreadPool    I   WSVR0652W: The size of thread pool "WebContainer" has reached 100 percent of its maximum.
We have worked a Support case and it was confirmed that we are hitting a limit and was suggested that performance tuning is beyond the scope of Support and that we would need to purchase some sort of Services. We are running only 16 ISD applications along with many DataStage and QualityStage jobs (mostly DataStage-only) that are scheduled throughout each day. We are running on IBM POWER8 with 6 cores and plenty of free memory. All server tiers are on the same box. We found guidance from IBM that the thread pool should only be increased up to a limit of 10x the number of cores, or 60 in our case. Has anyone pushed past that ratio?

What is the limit or rule of thumb on the number of ISD applications that can be run simultaneously on a typical Information Server, where all server tiers are installed on the same computer?

Also, does anyone know the limit or rule of thumb on the number of ISD applications that can be run simultaneously on Information Server, when the server tiers are installed on separate computers?
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... this limit isn't configurable? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

We already bumped up the max to 60, for our 6 cores, based on the guidance found here (warning: heavy reading):

IBM WebSphere Application Server Performance Cookbook

"Thread pools need to be sized with the total number of hardware processor cores in mind."
...
"Good practice is to use 5 threads per server CPU core for the default thread pool, and 10 threads per server CPU for the ORB and Web container thread pools."

Has anyone else run into these types of warnings and tried to exceed the suggested ratio of 10 threads per server CPU?
Choose a job you love, and you will never have to work a day in your life. - Confucius
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I've never seen that particular message --- I wonder if it is related to "traffic"....meaning...struggles that WAS might be having as a bottleneck because of the number of requests coming in, as opposed to the number of Operations (Job Instances) that are spawned.

Does the error happen randomly when applications are banging away at your ISD services? ...or even when the services are idle and/or just trying to start up?

...if only or also when they are just idle...

I don't know where the "thread" numbers come from, but for years, a very good rule of thumb for ISD Job stability (keeping Jobs up and running, regardless of their traffic) has been to estimate about 15 osh processes "per core" for your real-time, always on Jobs. This is even if they are just "sitting there idle" not doing anything. The overhead to run that many Jobs concurrently (always on Jobs are the same as concurrent batch Jobs), especially if you have multiple instances per service, is a lot. You might need to analyze a DUMP score for your parallel Jobs (Server Jobs have a LOT fewer processes), but even a simple estimate of one osh process per Stage means that a 10 Stage always on Job with 3 instances is immediately spawning 30 osh processes.

As for traffic....if that's the issue here, it should be measurable (like, only happens at certain times of the day when 1000 users get on, etc.)...and that's another whole discussion.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Thanks for responding. The warnings consistenly come in late afternoons, pretty much within minutes of each other, even across days and weeks.

We had initially assumed traffic would be the cause--that there must be an unusually large number of ISD requests hitting our server at once during that time of the afternoon. We have done numerous counts of our ISD audit logs across many dates when we saw thread warnings. We found that the traffic assumption was not supported by the numbers. In fact, the number of overall ISD requests was much greater at other times of the day when thread warnings have never been logged.

We later realized that we had one 30+ minute DataStage sequence scheduled to start about 5 to 10 minutes prior to the time of the thread warnings. We delayed the sequence's start time by 30 minutes to see if the thread warnings would follow. While they did not follow the sequence, they started happening about 20 minutes later than we had ever seen before, which is about 10 minutes before the sequence kicks off. So, we ruled that out also, and I have not yet been able to find a pattern that it does follow.

Thank you for sharing a rule of thumb estimate you knew about. That is new info to me and is what I was looking for as another avenue to investigate. Now we can begin counting something more appropriate and go from there.
Choose a job you love, and you will never have to work a day in your life. - Confucius
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The "symptom" that led to that rule of thumb is primarily stability...that Jobs that have to "restart" would fall over because there was too much on the system, or that randomly, when ISD would cycle a Job instance, it would put too much pressure on the machine.....hard to nail down, but there are other settings that can sometimes help, such as:

1. lock in your min/max to the same number. Don't let ISD try to decide when it needs more instances. If you need 5 as a worst case, make it always 5.

2. I am not looking at the client right now, but there is a setting that forces ISD never to "age - out" an instance. It will remain running. That prevents it from even cycling "one" instance in/out. Play with that carefully, but it might reduce volatility here.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Thank you for the additional pointers.

Within the always-on ISD jobs, according to the score of each job we have a total of 224 processes running. As noted earlier, this box has 6 cores, so that averages out to always running at least 37 processes per core. Another surprise is that we found one job set to run on 4 nodes while each was supposed to be set to run on a single node. We are not, however, seeing any churn with ISD jobs stopping and new instances getting spawned. Perhaps we are on the borderline of pushing some undocumented limit.

Our ISD min/max instance settings are mostly set to 1/2 while a small handful are set to 1/5 and one is set to 1/1.

All of our ISD applications have the "Infinite" setting checked under "Idle Time" (if that's the setting you were thinking of).
Choose a job you love, and you will never have to work a day in your life. - Confucius
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Sounds good and smartly calculated. Probably needs support to find out the true origin of that error.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply