By Joe Clabby, President, Clabby Analytics
The term “tip-of-the-iceberg” refers to a small evident part or aspect of something largely hidden. And, at this juncture, this is an ideal term to describe IBM’s new zAware program.
IBM zAware is the acronym for IBM System z Advanced Workload Analysis Reporter – a program designed to automatically assist in mainframe troubleshooting by analyzing minute detail in systems logs (as well as other data relevant to systems and application performance) in order to isolate problems or to help discover the cause of anomalous behaviors.
The way that this program works is that it analyzes vast volumes of messages, comparing message traffic during known-good operations to message traffic during the current time period. When a performance or reliability issues occur, IBM zAware points operations to any unusual message traffic. In this respect, zAware is essentially an analytic message-based troubleshooter.
This program is exciting from a number of angles:
- It can help information technology (IT) managers identify the root cause of a problem faster than using the traditional trial-and-error guesswork approach (it is especially useful for identifying needle-in-a-haystack type problems).
- It can help preempt snowballing degradation (caused by a problem) from impacting the system further. It does this by flagging traffic that looks unusual and helping programmers quickly direct focus to the unusual behavior.
- It can improve operational efficiency by showing mainframe managers where problems reside (these managers then spend less time trying to find problems and more time fixing them). And,
- zAware opens the door for more and more automated management/analysis programs that will greatly simplify mainframe management over time (for instance, zAware’s xml output is now consumed by Tivoli, which can potentially add additional troubleshooting. Further, NetView can use IBM zAware data to improve problem determination.
This fourth point is very important to grasp. Today’s mainframes generate a lot of message/status traffic – far more than human managers can track and analyze. So, with IBM zAware, what IBM has done is that it has applied its vast analytics knowledge to message traffic analysis – an effort that will initially streamline mainframe troubleshooting – but has the potential to be expanded across the entire mainframe management environment.
In short, analytical problem analysis could be used in the future to streamline configuration management, to tune applications (advanced application performance management), for security deployment and operations – and more. And the net result could be that someday, by using zAware and related analytics programs, the mainframe could practically manage itself – with very little human involvement. (Mainframe managers would not “go away” – instead they would take on new tasks that align business operations with underlying technology to achieve new levels of efficiency in service delivery).
How Is zAware Being Used Today?
Unfortunately, at this time, there are few customers with enough experience with zAware to provide many use case scenarios regarding zAware within the enterprise. The primary reason for this situation is that the product has only been recently released – and it is necessary to capture at least 90 days of system information in order to establish a baseline for “normal operations.” Customers are just starting to deploy zAware now.
Still, IBM was able to provide me with a few use cases for zAware based upon internal usage. The first is based on a development System z deployed in Poughkeepsie, NY. In this case, an IBM IMS database was having a problem (the problem message informed database managers that the definition of a data file was missing).
This particular problem was generating a lot of message traffic – but the cause for this problem was unknown. IBM’s zAware had been deployed on this system during its development – and, accordingly, zAware had a good understanding (based upon the data that had been gathered) about what the environment looked like when operating correctly.
By analyzing the difference between the known-good configuration data and the problem at hand, zAware was able to show IT managers that a configuration mistake had been made by operators who had reconfigured VTAM (virtual telecommunications access method – the subsystem that implements communications).
If this problem had not been isolated by zAware, it had the potential to cascade, potentially causing a VTAM communications failure. As mainframe operators know, failures are unacceptable on mainframes (mainframes can go decades without a failure – which is why mainframes have the best meantime between failure average in the industry). So, in this case, zAware helped identify a potential failure that could have caused a major mainframe communications outage.
IBM also provided another internal zAware scenario: an LDAP server failure. In this case, an LDAP directory server in a test environment would ABEND (abnormally end) and then restart. IBM zAware was used to isolate the cause of this failure – and found a message that described the root of the problem. This message was overlooked by systems administrators, but zAware was able to “highlight” that this particular LDAP server needed to be reconfigured with more storage such that the problem could be debugged.
Although two use cases is not much to go on in terms of how customers are using zAware, they do represent a start. These use cases show how answers to problems can be found by analyzing message traffic – and how zAware can be used to sift through mountains of message traffic in order to quickly identify problems – leading to more rapid problem resolution.
The Bottom Line
When problems occur, systems administrators need to get a handle on the source of the problem – and be positioned to fix that problem rapidly. And some mainframe shops may not have the right skills in the right place to perform in-depth root cause problem determination. This is why a tool such as zAware is so valuable: it saves time while helping to address skills shortage issues.
Based upon what we’ve seen to date, and what I believe that the future of this product could be, I think we’ve only seen the tip of the iceberg when it comes to machine-driven advance analytics management tools.