The Convergence of Big Data and HPC

Print Friendly, PDF & Email

barry_boldingIn this special guest feature, Barry Bolding, Senior VP and Chief Strategy Officer at Cray Inc., discusses a highly germane topic for many enterprises today: the intersection of big data and high performance computing. Barry Bolding serves as Senior Vice President and Chief Strategy Officer, responsible for Cray’s strategic planning and corporate development. Dr. Bolding was appointed vice president in 2009 and was responsible for product management, corporate and product marketing for high performance computing solutions, and storage and data management. Prior to 2009, he served as Cray’s director of product marketing, analyzing future products and developing long-term strategies. Over the course of his career, Dr. Bolding has worked with key customers in government, academia and commercial markets and held positions as a scientist, applications specialist, systems architect and presales product and marketing manager. He first joined Cray Research, Inc., in 1992 and later worked with Network Computing Services and IBM, returning to Cray in 1999. Dr. Bolding holds a B.S. in chemistry from the University of California, Davis and a Ph.D. in chemical physics from Stanford University.

Statistics, a field of study that has not often excited the masses, has been revitalized by the age of big data. Every individual is now being exposed to as much information in a single day as our 15th century ancestors were in their entire lifetimes. 2.5 quintillion new bytes of data — an unimaginably large number – are created each and every day. And not only is the amount of data skyrocketing, but the velocity of data is as well. Some 90% of the data in the world was created in the last two years alone.

Data growth and speed is occurring faster than ever, while – at the same time – data is becoming obsolete faster than ever. CIOs and their companies are faced with substantial hurdles to get a handle on their data, and fast.

The obvious challenge is how to effectively analyze your data quickly, to gain insight into the problems you face daily and thus better manage your business. This essential need to parse mountains of data is leading to an explosion in AI and machine learning related companies over the past two years. In just 3 quarters (between Q1 and Q3 2015), $47.2 billion was invested in AI and machine learning, with roughly 900 companies tackling problems in business intelligence, finance and security.

And while machine learning has captured a lot of attention, there’s an equally important element to running predictive analytics, particularly when time-to-result is crucial to the business mission: high performance computing. The convergence of analytics, big data and HPC, or “data intensive computing,” is essential when you need to compute, store and analyze enormous, complex data sets very quickly in a highly scalable environment.

Firms in manufacturing, financial services, weather forecasting, cyber-reconnaissance, life sciences & pharmaceuticals, energy exploration and more are all using the data intensive power of supercomputers to push the envelope for research and discovery, and to answer questions that are not practical to answer using any other means.

Data Intensive Computing

There are a number of reasons why these organizations turn to data intensive computing. Let’s look into three of them:

Improving product development & design

The convergence of big data and HPC – particularly in manufacturing – is having a remarkable impact on product development and design. Capturing data from both physical tests and customer feedback enables auto companies, for example, to improve product quality and driver experience. Testing the aerodynamics of a new model through simulation and data analysis enables manufacturers to make changes far more quickly than when running physical tests. Modeling wind flow, structural analysis, fuel consumption and more have become indispensable tools in the manufacture of faster (in the case of Formula One), safer, more efficient vehicles.

From a different perspective – for life science organizations – high performance data analytics can provide valuable insight into genomics, the progress of disease treatment, which ultimately leads to drug discovery. The nonprofit Broad Institute of MIT and Harvard now generates 14 gigabytes of data every minute through their genome sequencers. Using a big data analytics hardware appliance equipped with HPC components, they are able to significantly shorten the time to achieve the Quality Score Recalibration (QSR) Results from its genome analysis toolkit “GATK4” and the Apache Spark pipeline, from 40 minutes down to nine.

The Limits of scalability

The promise of data intensive computing is that it can bring together the technologies of traditional supercomputing, where scalability is king, with the best, new data analytics technologies of the broader community and thus provide platforms that solve the most complex problems we face. Achieving applications scalability can only be done if the networking and memory features of the systems are efficient, scalable and large.  Globally addressable memory and low latency network technologies developed for supercomputing scalability bring to analytics the ability to achieve new levels of scalability. The pinnacle virtues of cloud are flexibility and feature richness.  To maximize these virtues, cloud sacrifices user architectural control and consequently fails to meet the challenge of applications that require scale and complexity. Companies across all different verticals need to find the right balance of usage between the flexibility of cloud and the power of scalable systems. The proper balance will result in the best ROI, the maximum business advantage and ultimately to segment leadership in the competitive world dominated by high performance data analytics.

Data intensive computing… as a service

Just as cloud is a delivery mechanism for generic computing, now data-intensive, scalable system results can be delivered without capital acquisitions. A breakthrough threat analytics service, offered by Deloitte Advisory Cyber Risk Services, takes a different approach to HPC and analytics. Deloitte is using high performance technologies of Spark™, Hadoop®, and the Cray Graph Engine, all powered by the Urika-GX analytics engine to provide insights into how your IT infrastructure and data looks to an outside aggressor. Most importantly this service is available through a subscription-based model as well as through system acquisition.

Deloitte’s platform combines supercomputing technologies with a software framework for analytics, and it is designed to help companies discover, understand and take action against cyber adversaries. The US Department of Defense is actively using it to provide remediation experts with intelligence reports to provide actionable insights on potential threat vectors.

While this is a security-specific solution, it’s also an indicator of how quickly technology is changing. Analytics-as-a-Service is quickly evolving, and new HPC solutions are coming to market.

HPC: The Answer to Unresolved Questions

Ultimately, the decision to consider a data intensive computing solution comes down to the amount of data you have and the speed with which you need to analyze it.

For CIOs tackling the world’s most complex problems, harnessing, parsing, and analyzing data to glean previously unknown insights provides a distinct competitive advantage. Fast-moving datasets help inform strategy decisions, spur innovation, inspire new products, enhance customer relationships and more.

So if you’re struggling to maintain the productivity of your framework, it may be time to develop your data intensive computing strategy.

 

Sign up for the free insideBIGDATA newsletter.

 

Speak Your Mind

*