Wait accounting is the patented technology built into IBM i that tells you what a thread or task is doing when it isn’t doing anything. This is an IBM i exclusive that’s possible because the Rochester (Minn.) development lab develops all the layers of i.
Wait accounting is a very powerful capability for detailed performance analysis. This entry focuses on waiting, why threads wait and how you can use wait accounting to troubleshoot performance problems or simply improve the performance of your applications.
Let’s start by reviewing some terminology. Most people are familiar with the term “job.” Every job has at least one thread and may have multiple threads. Every thread is represented by a licensed internal code (LIC) task, but tasks also exist without the IBM i thread-level structures. LIC tasks are generally not visible externally except through the IBM i performance or service tools. Wait accounting concepts apply to both threads and tasks; thus, the terms “thread” and “task” are used when referring to an executable piece of work.
A thread or task has two basic states. It can be executing on the processor (this is the running state) or it can be waiting to run on the processor. There are three key wait conditions:
- Ready to run, waiting for the processor. This is a special wait state and is generally referred to as CPU queueing, which means the thread or task is queued and waiting to run on the CPU. One reason CPU queueing can occur is if the partition is overloaded and there’s more work than it can accommodate. Logical partitioning and simultaneous multithreading can also result in CPU queueing; however, these are quite complex topics and are covered in the Job Waits Whitepaper.
- Idle waits. Idle waits are normal and expected wait conditions. Idle waits occur when the thread is waiting for external input from a user, the network or another application. Until that input is received, there’s no work to be done.
- Blocked waits. Blocked waits are a result of serialization mechanisms to synchronize access to shared resources. Blocked waits may be normal and expected — for example, serialized access to updating a row in a table, disk I/O operations or communications I/O operations. However, blocked waits may be abnormal, and it’s these unexpected block points where wait accounting can be helpful.
You can think of the lifetime of a thread or a task in a graphical manner, breaking out the time spent running or waiting. This high-level graphical depiction is called the run-wait time signature:
Traditionally, the focus for improving an application’s performance was to have it use the CPU as efficiently as possible. On IBM i with wait accounting, we can examine the time spent waiting and understand what contributed to that wait time. If elements of waiting can be reduced or eliminated, the overall performance can also be improved.
Nearly all of the wait conditions in IBM i have been identified and enumerated – that is, each unique wait point is assigned a numerical value. The 6.1 release includes 268 unique wait conditions! Keeping track of so many unique wait conditions for every thread and task would consume too much storage, so IBM uses a grouping approach. Each unique wait condition is assigned to one of 32 groups, or buckets. As threads or tasks go into and out of wait conditions, the task dispatcher maps the wait condition to the appropriate group.
If we take the run-wait time signature using wait accounting, we can now identify the components that make up the time the thread or task was waiting. If the thread’s wait time was due to reading and writing data to disk, locking records for serialized access, and journaling the data, we’d see the waits broken out as:
When you understand the types of waits that involved, you can start to ask yourself some questions. For this situation, you could ask:
- Are disk reads causing page faults? If so, are my pool sizes appropriate?
- What programs are causing the disk reads and writes? Is there unnecessary I/O that can be reduced or eliminated? Or can the I/O be done asynchronously?
- Is my record-locking strategy optimal? Or am I locking records unnecessarily?
- What files are being journaled? Are all the journals required and optimally configured?
You’ll see many of these wait groups surface if you do wait analysis on your application. Understanding what your application is doing and why it’s waiting in those situations can possibly help you reduce or eliminate unnecessary waits.
Holders and Waiters
Not only does IBM i keep track of what resource a thread or task is waiting on, it also keeps track of the thread or task that has the resource allocated to it. This is a very powerful feature. A holder is the thread or task that’s using the serialized resource. A waiter is the thread or task that wants access to that serialized resource.
IBM i also manages call stacks for every thread or task. This is independent of the wait accounting information. The call stack shows the programs and procedures that have been invoked and can be very useful in understanding the wait condition since the call stack gives an outline of the logic that led up to either holding a resource or wanting to get access to it. The combination of holder, waiter and call stacks provides a very powerful capability to analyze wait conditions. No other operating system provides such rich function.
Collecting and Analyzing the Data
Collection Services and Job Watcher are two performance data collection mechanisms that collect the wait accounting information. Job Watcher also collects holder and waiter information, as well as call stacks. Once the performance data has been collected, you can graphically analyze it. In IBM i 6.1, the IBM Systems Director Navigator Web console has the Performance tasks; the Investigate Data feature can be used to graphically view wait data through a browser interface. Or you can use iDoctor client to view wait data.
The Redbooks publication, End to end Performance Management on IBM i, covers the Investigate Data feature and wait accounting in much more detail.
This blog post was edited for currency on January 26, 2020.
This blog post was originally published on IBMSystemsMag.com and is reproduced here by permission of IBM Systems Media.