If you are a regular reader of this blog, you no doubt have come to realize how very fortunate we are on IBM i in the realm of performance tools and performance management. From the seamless instrumentation of metrics, to the various performance data collectors (all included in the operating system), to the powerful graphical tools that are available to help analyze the data. Couple that with the benefit of the patented wait accounting technology, and it’s clear that we have an impressive end-to-end suite of technologies at our disposal to monitor, manage and solve performance problems.
Now on to the intangible – the methodology. Even with really great tooling, analyzing the performance of a partition can be a complex task. The amount of data can be overwhelming. It can be hard to know where to start, what tool(s) to use, and whether the data is indicative of a problem or not. Does this sound familiar?
Let’s imagine the following scenario. Every day this week, the users have started to complain about their response times becoming unreasonable around 3:00 p.m., and the slowdown usually last for an hour. A typical methodology to use in the scenario would be to use leverage the real-time tools (Active Jobs, Disk Status, System Status, etc.) to gain initial clues on what system resources are being consumed (and a fleeting look at by who) during the particular troublesome timeframe. The next step would be to use the capabilities of Collection Services to determine in further detail who is using resources and what resources are being consumed. In addition to the who/what is being used, Collection Services is also used to find bottlenecks such as what resources jobs are waiting for (e.g., CPU, Disk I/O), and types of contention time if present. Sometimes, the problem can be solved with just this information. Perhaps during this timeframe we see a lot of CPU queuing time occurring. So next, we look at who is using CPU, and because we kept a baseline (you are keeping a baseline, right?), we look for changes. What we notice is that a new set of reports for management are now being run at 3 p.m. every day, instead of during off-shift hours, pushing us over the “knee of the curve” for CPU utilization. In this case, it may be as simple as rescheduling these reports. However, unfortunately, in many cases it is not so simple and we must continue the investigation. The next step then would be to focus our attention on Job Watcher data. If not already being collected, we begin to collect it. Job Watcher will give us an even more granular view of where time is being spent both at a system level and at a job level. It will give us additional clues on wait objects, call stacks and SQL being run. If the problem is not solved using Job Watcher, but we can establish it is I/O related, we may next turn to Disk Watcher. If we need to get down into the application and program level in order to completely solve the issue, we will use Performance Explorer.
Now, back to my original question “Does this sound familiar?”…..
Because we know that many of you nodded your head “yes”, the Lab Services IBM i Performance team was inspired to build a three-day workshop “IBM i Performance Analysis Workshop” to help teach you some of the techniques we use to analyze data and solve performance problems. This workshop is not focused around how to use the tools, but rather the methodology used. We will cover the strengths of the collector tools and when to use them, as well as core methodology and analysis techniques for CPU, Memory, I/O and various waits. Our goal is to aid you in building a foundation of performance methodology you can apply in your environment. If you are interested in more details, including enrolling in a public workshop, please see this link:
I’d like to thank Stacy Benfield for writing this blog. Stacy is a member of the Lab Services Performance team and was actively involved in the creation of this workshop. Prior to joining the Lab Services performance team, Stacy was part of the IBM i development team, working on performance tools, where she was the team leader of the Performance Data Investigator.
This blog post was originally published on IBMSystemsMag.com and is reproduced here by permission of IBM Systems Media.