The Work Management team recently released PTF SI42845 for the 7.1 release that changes how IBM i manages jobs that exceed their CPU or storage limits. While introduced with 7.1, this support is in the base operating system of all subsequent release.
The class object defines the processing attributes for a job. The routing entry in the subsystem description is used to determine which class object is used when a job is initiated. Two of these processing attributes within the class object are Maximum processing unit time (CPUTIME) and Maximum temporary storage allowed (MAXTMPSTG), which both have default values of *NOMAX. Prior to this recent PTF, if values were entered for these parameters, the job would be ended if one of the limits was hit. For the maximum processing unit time, the job would be ended with CPC1218 (Job ended abnormally); for the maximum temporary storage allowed, the job would be ended with CPC1217 (Job ended abnormally). The cause for each of these messages tells you whether the job ended abnormally due to the maximum CPU time being consumed or the maximum temporary storage limit being exceeded.
The system can’t know if the job was actually near the completion of the work it had to do when it would end the job. It’s possible that given a little more CPU time or temporary storage, the job would be able to run to completion. Because of the difficulty in predicting the upper CPU or temporary storage limits required by a job, along with the fact that the job would be ended when these limits were hit, many customers simply left these values at their default setting.
The above PTF that was recently released changes the behavior so that jobs are no longer ended when they have exceeded their maximum processing unit time or their maximum temporary storage limit. Rather, the jobs will be held. When a job is held by the system due to these conditions, a message will be sent to the QSYSOPR message queue:
CPI112D – Job held by the system, CPUTIME limit exceeded
CPI112E – Job held by the system, MAXTMPSTG limit exceeded
This change allows the system operator to determine whether the jobs should be ended or if they should be allowed to continue to run to completion.
If you want the jobs to continue to run, you must change the limit that was hit and then use the Release Job (RLSJOB) command (you can’t release a job that’s above the limit). To allow these values to be changed, the Change Job command and the Change Job APIs have been enhanced.
The Change Job (CHGJOB) command has been enhanced with two new parameters:
- Maximum CPU time (CPUTIME): The maximum CPU time parameter specifies the maximum processing unit time (in milliseconds) that the job can use. If the maximum time is exceeded, the job is held.
- Maximum temporary storage (MAXTMPSTG): The maximum temporary storage parameter specifies the maximum amount of temporary auxiliary storage (in megabytes) that the job can use. This temporary storage is used for storage required by the program itself and by implicitly created internal system objects used to support the job. (It doesn’t include storage for objects in the QTEMP library.) If the maximum temporary storage is exceeded, the job is held.
The Change Job (QWTCHGJB) API has been enhanced to support two new keys on the JOBC0100 and JOBC0200 formats:
- Maximum processing unit time allowed, in milliseconds (1302)
- Maximum temporary storage allowed, in megabytes (1305)
This change makes it easier for you to protect your system from the effects of a run-away job that either consumes more CPU than expected or uses more temporary storage than expected. By setting these limits larger than what any job should use, you can protect the system from the potentially negative affects of a run-away job. Because the job will be held rather than ended, the limits don’t need to be set perfectly. If either limit is hit, you can increase the limit with the change job command or API then release the job to allow it to continue to run. If the new upper limit is hit, the system will once again hold the job.
With the change, you should start to move away from the default *NOMAX values and set appropriate limits. Particularly with the temporary storage limit, you can prevent a system outage by setting an upper limit on the class object for the maximum temporary storage that a job can use (but be sure to keep that limit lower than the amount of storage available on the system). With the new behavior of the job being held when the limit is hit, you have the capability to assess and determine the best action for the job.
I’d like to thank Dan Tarara from the IBM i work management development team for his assistance in writing this blog article.
This blog post was originally published on IBMSystemsMag.com and is reproduced here by permission of IBM Systems Media.