Sponsored content from IntelliMagic
Computers are great at doing what we tell them. They don’t get tired, bored, or complain about the endless work we have them doing. The tasks that have been given to computers range from the apps on your mobile device to the DevOps efforts that have helpfully provided hundreds of shortcuts in how we manage the work coming into important processing infrastructure.
What Is the Underlying Noise?
On the day of the O.J. Simpson verdict, Oct. 3, 1995, transaction rates plummeted at the Global Distribution System company I was working with at the time (The Apollo Reservation System), and, I assume, at many other sites around the world. The reason? Most humans with access to TV (150 million) and radio were waiting with great anticipation for the verdict. It was very brief — maybe 5 minutes — but the remaining transactions occurring at that time were likely either made through automation or those oblivious to the trial.
I know this because the transaction rates at Apollo were carefully monitored and evaluated as we (the performance analysts) managed the performance of the online systems and subsystems. Travel agents were the primary users of Apollo’s system, so they were a bit ahead of the curve on automating repetitive sequences, but automation was making its way into businesses everywhere as Windows devices made doing so easier.
Turns out the O.J. trial sparked my interest in the topic of automation.
Is Your Infrastructure on the 'Stairway to Heaven'?
Other than being a classic Rock ‘n Roll song, some of those words may also represent the endless growth driving your infrastructure, and it’s hardly a heavenly outcome. It could be CICS, or DDF transaction rates, or something else. Even though CICS and DB2 are heavily instrumented, monitored, and analyzed, we still often see a disconnect between business growth and resource demand.
Let’s be clear. CICS and DB2 are often NOT the source of the problem, but rather victims of their own success.
The need for speed in our culture has also been a significant contributor to infrastructure demand.
For example, 30 years ago most of us were getting weather updates once a day in the newspaper or on the TV. Today, I can easily set a weather app on my phone to give me updates several times a day and, more often, near real-time updates of historical and forecast radar — for example, when a Texas-sized thunderstorm is passing through.
Using the WLM Policy to Reduce Impact of Lower-importance Work
One of the main competitive advantages for z/OS is the integration of workload priority into the operating system. Intel-based systems play “nice”, and virtualization lets you share some resources disproportionately in the virtualization world, but nothing touches the breadth and depth of z/OS workload manager, or WLM, on IBM Z.
Information is power; fast information can be “instant” power, and, in some cases, can save lives. We shouldn’t dismiss the business value in speed, but needless repetition comes with a cost.
A good capacity planner should work to answer many questions, especially:
- What is the overall requirement for ‘robotic’ activity on our systems?
- Is there a way to tune it out?
Let’s answer the second question first and then spend time evaluating some ways to help you be the hero as you unmask the hoard of robots and tune them out, or eliminate them if they are unnecessary.
The performance index (PI) is one of the measures that demonstrates how WLM manages dispatch priority in the z/OS operating system. A value of 1 tells us that I am getting the service according to my given priority, a value that is higher tells me that I am getting less. The dispatcher should favor me over those with lower priority. A PI value lower than 1 suggests that I am getting more than has been established compared to others and I’m a donor candidate when the system is resource constrained.
If you are a z/OS performance analyst, you are probably aware of this. People are creative, and the computer doesn’t have to wait for a human to think about what to type and then type something in to get an answer…so some work gets automated.
Specific transactions might be repeated on a specific interval, driven by the source to obtain metadata on the transaction type, or to look for change in the output, so that the next actions can take place or decisions can be made. Human creativity in assigning tasks to robots is why we have so many useful examples in our homes today. Assigning a legitimate robot a lower priority can be an excellent way to reduce impact during peak demand.
But, what if nobody cares about that automated 5-minute report anymore? What if the response-time bots are no longer required because we have 16 other tools doing that work? Maybe it’s time to understand the usage a bit more. How many are there actually out there?
Hunting Robots
Before we get carried away, like Will Smith’s character in the movie “iRobot”, we don’t have to be paranoid, but we should work with the business since these robots haven’t destroyed their maker. The hunt can start by narrowing your search. The example below provides some answers to questions like:
- Why is the transaction rate constant all day?
- Is it just one source?
- What is the source?
With a few dynamic reporting edits into a CICS transaction report, the details are easily exposed, and you can work with the robot owners to explain your inquiry.
This also provides the basis for answering an important question: the cost. Why not have that answer ready when you make the call? A constant 120 seconds of CPU at 15-minute intervals is about 13% of one CP. This is a relatively small-scale example, but it is very likely there’s more to be found, and there are more robots being set up every day.
This could be one of the primary reasons you have unexplained growth in transaction rates and CPU demand. If your licensing model is Tailored Fit Pricing (TFP) then all consumption matters. If you have remained on 4HRA, it is still likely the above consumption is contributing to the peak 4HRA, because of the constant hum some of these workloads exhibit. If you have 100 CPs, this is a very small impact, but you can do the math on your software agreement to quickly come up with one primary driver of the IT cost and you’ll be prepared to share that with the robot owner when you call. In doing the math, you may observe that the impact is a pretty hefty process for an online transaction (8 CPU sec/tran). Is this staccato process driving more functionality than needed?
Eliminate, Reduce, or Prioritize?
First, ask the owner what is using the data every minute, then ask what the value of every minute is compared to every 30 minutes. One minute might be ”convenient,” but if it is only needed once a day, why not make it once a day vs. once a minute and save some cash for IT work that will bring in higher margin business?
If it must be there, the next question should be: What is the priority compared to other online work? A CICS transaction will generally have much higher priority than batch work. As mentioned earlier, WLM provides various ways to prioritize workload in the system, and there may be a convenient method to define this as discretionary CICS work. This may not save as much resource, but at least you are helping steer the workload out of the way of higher priority work when resource availability is slim.
Other Considerations
These principles and methods can also be applied to DDF work and other workloads. When the pattern that emerges consistently is similar to a looping process, it may just be an innocent robot doing its job and there may be no business loss to slowing it down.
Vastly improved networks, access to vast cloud resources along with easy-to-use scripting have created some real problems with “autobots.” I’ve observed this situation when new or prospective business partners used multiple cloud-based sources to “check” the scalability of production applications. The business is going to pursue new revenue sources aggressively, and you should be prepared to gate this production scaling evaluation with WLM, thereby preventing these activities in production.
Final Thoughts
Mainframes continue to be the primary source of data for numerous applications, and the human creativity in mobile apps shows up in workload behaviors that will continue to require IT managers to understand and operate efficiently. If your leaders are looking for reasons for the unexplained growth in your system, this example provides one place to look. Improve your ability to identify and manage these changes effectively and you will continue to demonstrate the competitive advantages mainframe technology offers.
Jack Opgenorth is a senior technical consultant with IntelliMagic and has been involved in IT infrastructure optimization for over 27 years. He has enjoyed working for a variety of organizations and technologies and has been involved with high transaction environments as a technical lead for enterprise technology delivery organizations in travel, transportation, retail, and government sectors.