Sponsored content from Broadcom
How fast can your organization identify potential or actual issues, analyze the root cause of those issues, and resolve or remediate them to prevent any significant business impact? Whatever your answer is, it probably comes down to “not fast enough.”
Enterprise organizations with hybrid IT infrastructures face major challenges in issue resolution. First, there is more complexity than ever before. Digital transformation is driving an increase in operational data, application changes, and transaction volume, all of which correlates to an ever-growing number of alerts to be managed. Second, IT teams tend to monitor in a siloed fashion, which impedes their ability to detect, diagnose, and act on complex issues in hybrid IT environments. Consequently, issues are often discovered too late to avoid business impact. Finally, too much pressure is being placed on too few operational experts to handle issue resolution.
Artificial intelligence for IT operations (AIOps) can help you overcome these challenges to accelerate issue resolution and minimize or eliminate business impact. Specifically, an AIOps approach inclusive of mainframe enables you to visualize and act on alerts that matter across your hybrid environment, empowers operators to speed response, and improves collaboration to reduce mean time to resolution (MTTR). Consider how AIOps for the mainframe can change business outcomes through these real-world examples.
Finding the Needle in the Haystack
A constant challenge facing organizations with complex IT infrastructures is finding the needle in the haystack. Organizations are so overwhelmed by alerts that they cannot determine where to focus efforts. Consequently, important issues may go unresolved and negatively impact customers.
AIOps leverages machine learning to sift through all of the alerts generated in your mainframe environment to focus on those alerts that matter most to your operations, reducing the noise and making it easier to visualize and act on those alerts. Using an AIOps solution that provides advanced auto-discovered topology that visually overlays alerts on business applications enables you to see your critical business application dependencies, as well as where there are potential or current impacts so you can act quickly to isolate and remediate issues. Automated discovery of topology components is particularly useful when trying to visualize complex mainframe systems, since performing this task manually is typically very time-consuming and often results in a topology view that is outdated by the time it’s created.
Clustering capabilities are also beneficial in reducing noise; intelligently clustering groups of alerts that are all related to the same underlying issue, even when they come from different tools. Therefore, instead of seeing 30 individual alerts from three products, you see one issue with 30 alerts from MQ, CICS, and Db2. This provides you with a clear view of how alerts are related and what actions need to be taken — and may surface non-obvious relationships between alerts that could help your teams pinpoint root causes faster.
Mainframe AIOps in Action
A multinational financial services organization used Broadcom’s AIOps solutions to improve their signal-to-noise ratio. Initially, the business was getting approximately 240,000 alerts per market day. It was truly impossible to find the needle in a haystack that large. By slashing those hundreds of thousands of alerts down to just a handful each day, they were able to focus on alerts associated with issues that could truly impact customers and, therefore, the business. Those few issues could be efficiently researched and analyzed to determine if any remediation steps were necessary. Without the distractions and stress caused by an avalanche of alerts, the IT team was able to zero in on the alerts that really mattered, keeping the business running smoothly and delivering a great customer experience.
Equipping the Data Center’s First Responders
In issue resolution, operators are the first responders of the data center. Unfortunately, many operators have limited knowledge about the mainframe. AIOps solutions that are inclusive of the mainframe can help bridge this knowledge gap by enabling operators to collect data from multiple products that provide information associated with an alert. By combining that data into an aggregate view, operators have all they need to handle the issue quickly and accurately.
AIOps can also facilitate the capture of subject-matter expert knowledge in decision trees, aiding problem diagnosis and remediation. This allows less-experienced operators to perform at a much higher level than they normally would be able to. Operators can also leverage AI to identify what actions have been effective in the past for similar problems, again empowering a faster response. For well-known issues, intelligent automation can be leveraged to remediate issues without any manual intervention.
Mainframe AIOps in Action
When an alarm sounded at health insurance provider Socialistische Mutualiteiten that there was a 3–4x increase in CICS transaction rates over a 5–10 minute period, Broadcom’s Mainframe Operational Intelligence (MOI) solution enabled data center operators to immediately identify the source of the spike. A single user ID had generated 10,000+ CICS transactions during the alert period. Upon investigation, an operator discovered that the user had developed a process where, after data was entered into an Excel spreadsheet, a system macro was executed that read the data as input. This generated a flood of CICS transactions. To rectify the situation, the system macro was converted to a batch program to prevent CICS usage spikes. The operator stated that it would have been impossible to identify the user and activity causing the high transaction rate issue before MOI.
Enhancing Collaboration for Better Outcomes
AIOps enables a new level of collaboration across personas and platforms and, therefore, offers a huge opportunity for reduction of MTTR. For instance, AIOps improves the ability of operators to understand and manage changes that may impact operational performance — ranging from web-based interfaces to backend CICS modules — with a quality assurance manifest. It also provides operators with broad and deep views related to alerts that enable them to better understand what application and configuration changes may be related to the alert at hand, regardless of platform.
Mainframe AIOps in Action
A global financial services group with 1,500–2,000 application changes per month and zero tolerance for disruptions or downtime wanted to aggressively transform the way their command center operated. They needed centralized event management across platforms and strong correlation between tools to tear down silos and boost collaboration. Broadcom’s AIOps solutions integrated key data sources and provided critical end-to-end observability across their applications, infrastructure, and network, inclusive of their mainframe. This allows operators in data centers located around the world to improve root cause analysis (RCA), decrease MTTR, and make a radical shift from reactive troubleshooting to proactive issue prevention.
Hit the AIOps Accelerator
Mainframe AIOps has the ability to accelerate issue resolution dramatically by enabling you to identify and prioritize alerts that truly matter. With better visibility and analysis of cross-domain data, you can swiftly discover insights, patterns, correlations, and root causes, boosting productivity and efficiency while reducing MTTR. Potential problems can be anticipated by assessing performance patterns, making the shift from reactive recovery to proactive resolution a reality.
With AIOps for your mainframe, you can boost productivity and avoid expensive downtime and business impacts. The tools and technologies are primed and ready — all you need to do is hit the accelerator. Visit mainframe.broadcom.com/AIOps to get started.
Chris "Spence" Spencer is a product manager for advanced analytics and data science in the Mainframe Software Division at Broadcom. Spence has over 20 years of experience in IT. He worked with IBM's Emerging Technologies group as Watson was transforming from a research project to a commercial offering, giving him a unique perspective on the evolution and impact of machine learning and AI on enterprise environments. Spence is currently responsible for strategy and new technologies from Broadcom focused on leveraging AI for enterprise operations (AIOps).