The acceleration of digital initiatives is giving the mainframe new opportunities to show its strength. As leading organizations deliver unprecedented digital experiences, behind each click is massive complexity typically involving hundreds of interwoven services that connect hybrid IT infrastructures, from billions of end-user devices to multi-cloud and mainframe services. The mainframe platform is built to handle this exponential growth being driven by digital transformation and to help manage the growth and complexity while ensuring exceptional customer experiences, as more and more organizations apply AI to their IT operations (AIOps).
AIOps combines big data and machine learning (ML) algorithms to augment and automate day-to-day IT operations tasks, ranging from performance monitoring and reporting to data correlation and analysis. Here are three keys to successfully implementing an AIOps strategy that is inclusive of mainframe:
1. Break down data silos to analyze issues faster
Data is the key to AIOps. The difficulty is that the ever-increasing volume of data, which is doubling year over year, resides in multiple domains across hybrid IT environments, all of which are typically siloed. A broad set of data collectors is needed to feed the ML algorithms, but, to be useful, you also need the ability to visualize and analyze the data in context — enabling you to act upon insights derived from the data more efficiently and effectively.
Meaningful and actionable insights are derived based on information gathered and analyzed across your entire IT environment, helping you understand how one piece of data is connected to the next and how important the insight is to the business.
This begs the question: how can siloed data be transformed into actionable insights? Modern operational tools tend to go either broad or deep — not both. There are tools that provide a broad but thin layer of visibility into what is going on across IT domains, such as infrastructures, networks, storage devices, databases, and applications. There are also tools that deliver deep but narrow visibility into specific domains and platforms. While these tools provide great diagnostic value, they are not sufficient in themselves to break down data silos.
The reality is there is no one tool that does it all. Silos can only be broken down by embracing an open architecture that enables you to integrate curated data from across your entire hybrid technology stack — from mobile to mainframe to multi-cloud. This does not mean taking all your raw data and throwing it into one gigantic data lake. If you do, you’ll end up with a data swamp: a stagnant pool clogged with untold amounts of useless data. Instead, using an open architecture approach, you can augment and further curate the data with meaning behind each of the relationships.
When your IT operations tools embrace the use of open APIs to gather analytics, you gain the ability to view your curated data from different perspectives and share hidden insights across teams, thereby achieving greater efficiencies. You don’t need to rip and replace all your tools in order to begin gaining the benefits of AIOps. Instead, you can build on your current investments by leveraging open APIs to integrate the data you are already collecting.
For example, suppose you have a situation where a network is running too slowly. Data visibility from one tool may limit you to only what is happening within your network environment. However, if the slowdown was being caused by activities in a mainframe storage device, the root cause of the issue would remain unclear and just out of view. The ability to visualize data in context across domains and platforms reveals what is taking place and why it is happening, enabling teams to work together cross-functionally to resolve issues swiftly.
2. Reveal hidden insights to shift from recovery to avoidance
In addition to enabling efficient cross-functional analysis, AIOps allows data to be mined for patterns via machine learning. These patterns enable you to reveal hidden insights and alert you of potential issues sooner, shifting from a reactive recovery model to a proactive avoidance model.
Siloed data makes proactive operations nearly impossible. Too much data is missing from the equation. But, when data is synthesized and analyzed from multiple environments, and is combined with human expertise, the proactive analysis takes on higher levels of accuracy. Potential problems or abnormal trends can be identified early enough to remediate issues before they impact the business.
Using AIOps to generate proactive insights can also help address IT skills gaps. For example, mainframe operations have the benefit of people with decades of experience. However, these people are retiring and taking their tribal knowledge with them. AI and ML can be used to collect and codify this knowledge so that it is not lost. That knowledge will then contribute to proactive insights that the next generation of operators can use to keep business critical operations running smoothly.
3. Use automation to advance towards self-healing systems
Once AIOps is providing accurate insights in context, it is just one more step to have AIOps act upon those insights, remediating issues automatically before they impact the business. This is the ideal state: AIOps sends an alert as soon as an abnormal trend or possible issue is identified, quickly isolates the problem and diagnoses the source, and automates an appropriate response. No human intervention required.
Such automation does not happen all at once. It is best to automate slowly and methodically as you build your AIOps structure, starting with simple tasks and working up to more complex actions. For instance, you might want to automate reallocation of storage on demand to improve performance, optimize capacity to save on costs, or temporarily expand an MQ queue based on a workload spike.
If even “simple” automation sounds daunting or time-consuming, remember that ML can help dramatically. For example, instead of writing and maintaining hundreds or even thousands of lines of code to detect if your system is trending out of the norm, ML algorithms, trained using your curated data, can detect these patterns with just a few lines of code. Initially, the automation you put in place may only save you five minutes here or 10 minutes there. But minutes quickly add up to hours and days that can be spent on other value-added tasks. Automation can also help improve the stability of your system by enabling rapid corrections from an undesired to a desired state, preventing issues from happening in the first place.
By using an incremental AIOps approach that incorporates built-in feedback loops, you can establish “trusted automation” over time. Your IT operations can be shifted so that your personnel no longer have to handle routine matters, manage policies, or resolve the majority of issues that arise: all that will run on autopilot under the auspices of AIOps.
This pragmatic approach encourages incremental change at a sustainable rate. The changes add up over time to transform operations on your mainframe platform and thereby add significant value to your business.
Hear from a global financial services company and a Belgian healthcare organization about how they’re improving their mainframe operations with AIOps in the SHARE Virtual Summit 2021 session “AIOps: Anticipate the Unexpected”, or visit mainframe.broadcom.com/aiops to learn more about how Broadcom can help you implement AIOps across your hybrid IT environment.