Sponsored by Broadcom
Mainframe observability bridges operations’ knowledge gaps, leading to clarity, control, and faster problem resolution.
With detailed visibility into the mainframe, the "black box" is now open for those in the organization who need an enterprise-wide view of operational data. This visibility enables site reliability engineers (SREs), software developers, and operations staff, to understand and communicate using the same information about their applications.
Breaking Down Silos and Closing the Knowledge Gap
For many years, platform silos were prevalent, and the integration of mainframe data with distributed systems was often misunderstood.
As the industry evolved, the challenge of managing a maturing workforce became increasingly relevant. The modern mainframer is being asked to expand their skill set, taking on additional responsibilities beyond their primary domain or even venturing into new areas of management. Today, users expect to receive insights across multiple tools from various vendors to aid in resolving critical, real-time issues. This complex navigation typically relies on the expertise of subject matter experts (SMEs) — many of whom are transitioning to later stages of their careers, creating a knowledge gap that needs to be addressed.
As we move forward, a pressing question arises: How do we effectively supplement these SMEs with individuals who lack mainframe experience?
Enhanced Observability: The Broadcom WatchTower Platform™
In March 2024, Broadcom’s Mainframe Software Division made a significant stride in addressing these challenges by introducing the Broadcom WatchTower Platform™ for mainframe AIOps, which provides enhanced observability for mainframe operations. This platform provides enhanced observability for mainframe operations by integrating various operational tools, workflows, data, and insights from disparate sources, streamlining critical processes for problem identification and remediation. By harnessing machine learning (ML) and anomaly detection, the WatchTower Platform™ reduces complexity and noise, thereby empowering operations staff of all skill levels. This result provides the resources to manage alerts and resolve issues more efficiently.
Fundamentally, WatchTower is designed to be an enabling environment rather than simply a standalone product, where you can customize and implement at your own pace. The platform serves as a system that facilitates communication and collaboration among different products and data. The WatchTower Platform™ effectively fulfills this role by establishing a cohesive framework that enables domain-specific products to communicate with one another while also providing an analytical engine to correlate data across various domains.
WatchTower introduces new value to the tools already in use, contextualizing and visualizing the interconnectedness of data. Accelerating insights and enhancing the overall understanding of how these diverse data elements work together ultimately drives better decision-making and operational efficiency.
Enabling the Next Generation of Mainframers
As we welcome new personnel onto our mainframe teams, we often encounter a significant knowledge transfer gap for these new team members. Much of this crucial information resides solely in the minds of SMEs or is documented in runbooks and manuals. A significant challenge is the lack of accessible information that illustrates how various mainframe components are interconnected from an application perspective. This information should be easily shareable, serving as a valuable reference asset. There is an urgent need for auto-generated visibility into the interconnected hardware and software assets within mainframe operations, along with their resources and dependencies.
WatchTower addresses this challenge directly. In one example, a customer experienced significant performance degradation in their enterprise application. An SME used a diagnostic tool to identify that a messaging queue (MQ) was backing up and not being processed by a subsequent application call. When the SME communicated this finding to the application development team, they initially believed that their application had no interaction with the MQ in its workflow. However, the SME provided an auto-discovered topology map of their environment within WatchTower, clearly illustrating the connectivity and dependencies involving the MQ and its processing requirements. With this information, the application development team revisited their lab, and with the SME’s assistance, they were able to resolve the specific application issue more quickly.
Ensuring SREs Understand Revenue-Generating Processes and Application Throughput/Workflow)
Almost all applications span the enterprise, including the mainframe. The business line responsible for a specific set of applications must understand the customer experience to ensure they are serving the end customer efficiently. To achieve this, the SRE for the business line requires visibility into the entire application flow to identify opportunities for improvement and enable faster responses to issues.
With the data streaming capability of WatchTower, we can also enable any type of diagnostic application trace to be followed through the mainframe. Subsequent mainframe health metrics can also be shared via OpenTelemetry or solution APIs with the enterprise observability solution in a correlated manner. This helps us understand application degradation and its connection to environmental factors. Faster detection and remediation ensure that the business line effectively serves its customers while meeting their service level agreements.
Most applications operate across the entire enterprise environment, including the mainframe. The business line responsible for specific applications must understand the customer experience to serve its end-users efficiently. To achieve this, the site reliability engineer for that business line requires comprehensive visibility into the application's flow to identify opportunities for improvement and facilitate faster responses to issues.
With Broadcom’s WatchTower Platform™, you can leverage its real-time information streaming capability (z/IRIS) to seamlessly integrate performance data from mainframe applications into relevant enterprise observability tools. This integration enables end-to-end transaction workflow monitoring and real-time performance management. Mainframe health metrics are then shared via OpenTelemetry or solution APIs with the enterprise observability solution, allowing for correlation with environmental factors. One of our customers has implemented the real-time streaming capability as a replacement for a mainframe-type agent from a third-party observability provider, supplementing their operational dashboards with additional mainframe health data.
Empowering Mainframe Developers to Optimize and Modernize Applications
As a mainframe developer introduces new capabilities, it’s crucial to understand the effects of any changes on the production environment. This connectivity defines our approach to DevOps. The WatchTower application profiler is a key capability designed to monitor the execution of large business workloads and bridge the gap in application performance information. It discovers application workflows, collects data, and provides insights into application behavior, enabling the achievement of desired business service outcomes.
Diverse Observability Capabilities to Meet Customer Needs
By utilizing various components of the WatchTower Platform™, customers have been able to enhance operational efficiency, refine decision-making processes, and drive overall business value. This aligns with Broadcom’s goal of enhancing the value of your products by focusing on specific problem areas that can be deployed individually, each with significant business value. It’s only been a year, but with the WatchTower Platform, we continue to deliver observability capabilities designed to meet customers’ evolving business needs.
James “JD” Bagnell is a Value Stream Offering Manager within Broadcom’s Mainframe AIOps division, where he leads performance and automation initiatives across the mainframe ecosystem. With more than 30 years of experience in enterprise software development and product management, JD brings a powerful blend of technical expertise and strategic business insight. He has a proven track record in launching new technologies, driving enterprise-scale innovation, and developing go-to-market strategies that translate complex capabilities into customer value. A strong advocate for change and continuous improvement, JD excels at building compelling business cases for products that solve real-world challenges. As a frequent speaker at industry events such as GSE and SHARE, JD shares practical, forward-thinking strategies to strengthen IT infrastructure resilience.