You may have clocked the reference to the great Oasis album from 1995, (What’s the Story) Morning Glory?
If you don’t know the music, go and listen, it’s brilliant — but only after you’ve read this article. For now, I want to share a different story.
IBM announced the Telum Processor at HotChips in August 2021. It went on to become the processor chip for the recently announced IBM z16. Telum, with a 7 nm microprocessor, was built to meet many end user requirements, notably providing the ability to gain artificial intelligence- (AI-) based insights from data without compromising response times for high volume transactional workloads. It’s the future, kids.
So, here’s the Telum story.
It’s designed with a new dedicated on-chip accelerator for AI inference, enabling real-time AI embedded directly in transactional workloads, as well as offering improvements for performance, security, and availability. The microprocessor contains:
- Eight processor cores, clocked at over 5GHz, with each core supported by a redesigned 32MB private Level-2 cache.
- The Level-2 caches interact to form a 256MB virtual Level-3 and 2GB Level-4 cache.
- Along with improvements to the processor core itself, the 1.5x growth of cache per core over the IBM z15 is designed to enable a significant increase in both per-thread performance and total capacity.
- Telum’s performance improvements are vital for rapid response times in complex transaction systems, especially when augmented with real-time AI inference.
Telum also features significant innovations in security, with transparent encryption of main memory. Telum’s Secure Execution (TSE) improvements provide increased performance and usability for hyper protected virtual servers and trusted execution environments, making Telum a top choice for processing sensitive data in hybrid cloud architectures.
The predecessor IBM z15 chip was designed to enable industry-leading seven nines availability for z systems. Telum is engineered to improve further on availability, with innovations like a redesigned 8-channel memory interface capable of tolerating complete channel, or DIMM failures and designed to transparently recover data without impacting on response time.
The AI Advantage
IBM has a long history of embedding purpose-built accelerators, designed to improve performance of common tasks into hardware designs such as zEDC and SORT. Telum adds a new integrated AI accelerator with more than six TFLOPs compute capacity per chip. Every core has access to this accelerator and can dynamically leverage the entire compute capacity to minimize inference latency. Due to the centralized accelerator architecture with direct connection to the cache infrastructure, Telum enables extremely low latency inference for response-time-sensitive workloads.
Keeping data on the processor itself offers many latency and data protection advantages. The Telum processor is designed to help users maximize these benefits, providing low and consistent latency for embedding AI into response time sensitive transactions. This can enable you to leverage the results of AI inference to better control the outcome of transactions before they are completed.
For example: Using AI to analyze credit card transactions, so a business can predict which transactions have high risk associated with them and block them if required. Being able to analyze these transactions without impacting response times is critical in dealing with dubious transactions, helping businesses to avoid costly consequences and negative business impacts.
Harnessing AI and Machine Learning to Detect Fraud
Dodgy transactions can be detected and prevented by fraud detection algorithms using large volumes of digital transaction data. This is done in digital payment transactions and the e-commerce sector to prevent hackers who can potentially hack other customer accounts. Both supervised and unsupervised algorithms are used to monitor and analyze these large transactions, looking for suspicious activities in user accounts, and sending alerts to individuals.
Supervised machine learning (ML) is trained in “labeled” data; based on the dataset, the algorithm predicts the output. By contrast, unsupervised ML is an algorithm that learns from untagged data. The unsupervised algorithm is used when transaction data is non-existent or improperly tagged and helps discover the outliers, which helps detect any unusual pattern. In this way, AI enables the payments industry to process large numbers of transactions with low error rates.
Happily Ever After?
One international bank already uses AI on IBM Z as part of its credit card authorization process instead of using an off-platform inference solution. As a result, the bank can detect fraud during its credit card transaction authorization processing.
For the future, it’s looking to attain sub-millisecond response times, exploiting complex deep learning AI models while maintaining the critical scale and throughput needed to score up to 100,000 transactions per second — nearly a 10X increase over what it can achieve today. The bank wants consistent and reliable inference response times, with low millisecond latency to examine every transaction for fraud. Telum is designed to help meet such challenging requirements, specifically of running combined transactional and AI workloads at scale.
In today’s IT landscape, in a world increasingly underpinned by digital transformation, most stories start with the data that we have. The plot twists, and an exciting conclusion, generally come with how we leverage that data and profit — a.k.a. maximum insight. Knowing how to properly utilize AI and ML, and crucially having the infrastructure ready-made to support such approaches, has become the new standard in computing. And Telum is an important new chapter in our efforts to become future-ready.
 The Hot Chips Symposium is one of the semiconductor industry's leading conferences on high-performance microprocessors and related integrated circuits.