By Pedro Pereira
The amount of data the digital world generates is expected to grow at a 44-percent rate over the coming decade. In his first post of a 3-part series for SHARE President’s Corner, veteran tech writer Pedro Pereira explores the Big Deal about Big Data…
By 2020, there will be 35 zetabytes of data, a staggering increase from the current estimated 1.2 zetabytes. Let’s put that in perspective: 35 zetabytes equals 35 million petabytes. One petabyte is slightly more than 1 billion megabytes. That adds up to 7 quadrillion (that’s 15 zeroes after the 7) sets of Shakespeare’s complete works and roughly 28,200 U.S. Library of Congress print collections, which are estimated at 10 terabytes per collection.
Big data is a worldwide phenomenon that touches everyone who uses a cell phone, searches the web, trades on Wall Street, formats a report, streams video, types on a computer.
“Every click of ours is generating data,” says Anjul Bhambhri, IBM’s vice president of big data products. “When the Internet boom started, I don’t think at that time people anticipated that so much unstructured data was going to be created.”
We all contribute to the explosion of big data, loosely defined as datasets that grow beyond the ability of run-of-the-mill database tools to handle. “In a digitized world, consumers going about their day—communicating, browsing, buying, sharing, searching—create their own enormous trails of data,” states a May 2011 McKinsey Global Institute report.
The number of servers deployed around the world grew six-fold in the past decade to 32.6 million worldwide, according to Dr. Gururaj Rao, IBM fellow, Systems and Technology Group. Storage grew 69 percent in the same time period, Dr. Rao said during a SHARE conference in August 2011. Meanwhile, he noted, the number of Internet-connected devices is growing at a 42 percent yearly clip.
Big Deal, Indeed
Big data poses a seemingly insurmountable challenge for enterprises in a gamut of industries – retail, finance, healthcare, manufacturing, communications and government – to make sense of the volumes of information they produce and that grow exponentially every second. Most of the data – 80 percent or so – is unstructured, which complicates the ability to store, mine, analyze and act on it.
But let’s say you could do all that efficiently, what would be the benefit? What is the big deal about big data?
Benefits range from the mundane to seemingly pie-in-the-sky scenarios: Better targeted consumer products. Improved road traffic flow and urban planning. Catching and fixing potentially dangerous automobile flaws. Preventing credit card fraud. Predicting infection in at-risk newborns. Saving lives.
Enterprises would develop better, safer products they can more precisely target to customers. The healthcare industry, McKinsey posited in its report, could use big data to boost efficiency and quality while reducing costs by 8 percent. Retailers could boost operating margins by more than 60 percent.
Robert Rosen, a former SHARE president currently working in the government, says big data analysis led to a recent Volkswagen recall of nearly 170,000 diesel VWs and Audis over potentially faulty fuel lines. “There’s an example of extracting information from lots of unstructured data,” he says.
Where to Store It All
The benefits of big data analysis surely go beyond what we can imagine, but it poses some big challenges. Organizations have to figure out where to store it all and implement recovery policies and technology. Industries such as healthcare, law and finance are required to archive certain types of digital information and have it easily accessible for recovery in cases such as legal disputes and, of course, data loss.
With data growing at a projected 44 percent clip, it would take millions of storage systems to handle it all. According to the McKinsey report, the United States in 2010 had 16 exabytes of storage capacity, while Europe had 11. Combined, Europe and the United States could store only a fraction of the currently existing 1.2 zetabytes of data, since one 1 zetabyte equals 1,024 exabytes.
“While 75 percent of the information in the digital universe is generated by individuals, enterprises have some liability for 80 percent of information in the digital universe at some point in its digital life,” according to the IDC Digital Universe Study (sponsored by EMC - June 2011.)
The same IDC study continues: “The number of ‘files,’ or containers that encapsulate the information in the digital universe, is growing even faster than the information itself as more and more embedded systems pump their bits into the digital cosmos. In the next five years, these files will grow by a factor of 8, while the pool of IT staff available to manage them will grow only slightly.
“We are not going to have enough disks to store all this data,” says Rosen.
Vendors such as IBM, Samsung and GE Global are working hard on developing new technology. Be it laser-based, crystal disks, atomic holographic nanotechnology or something we don’t know about yet, the future of storage technology is critical to our ability to collect, organize and analyze big data. You can increase disk density by only so much, and once we reach the limit, says Rosen, “we’ll need something new.”
In the next installment of the Big deal About Big Data, Pedro Pereira continues his conversation with experts in the field, who discuss the issues of securing and analyzing all that information.