Big Data Analytics EXTRACTING INSIGHTS FROM - GP Bullhound

Loading...
INDEPENDENT TECHNOLOGY RESEARCH SECTOR UPDATE  NOV 2013  SOFTWARE

Big Data Analytics EXTRACTING INSIGHTS FROM EXABYTES Analytics is entering a new era Amidst the hype surrounding “Big Data”, a perfect storm of major trends is allowing organisations across all industries to combine advanced Analytics with new sources of external data to extract valuable insights capable of monetisation. We believe the Analytics market is entering a new era, where technology is capable of supporting data-driven business, in real-time.

Big Data technologies reaching enterprise-readiness A wave of innovative technology is powering a new generation of Analytics solutions. This wave spans software, hardware, and even entire computing paradigms, all of which are now reaching ‘enterprise-readiness’, and hence creating the conditions for mainstream adoption. Hadoop, in-memory computing, and cloud Infrastructure-as-a-Service are key Analytics enablers where enterprise adoption is currently limited but the potential is dramatic.

Predictive Analytics and Visualisation taking centre stage We see several vendors in Europe well positioned in the Big Data software market which is forecast to grow at a 56% CAGR (2011-2017) to reach $7.4bn by 2017. Our interviews with over 30 technology vendors, their customers, and investors reveal that Predictive Analytics and Visualisation are the top two emerging sub-segments.

$1.4bn of funding in the last 12 months and growing Big Data Analytics is one of the hottest sectors globally for VC and growth capital investment. Investment activity has grown over 200% year-over-year with $1.4bn of capital deployed into the sector in the last twelve months alone.

Another phase of Analytics M&A is underway We believe the market is entering a new phase of consolidation as major Business Intelligence vendors pursue inorganic growth and seek to acquire Big Data Analytics capabilities.

HUGH CAMPBELL [email protected] London: +44 207 101 7566 ALEXIS SCORER [email protected] London: +44 207 101 7593 WILL SHELDON [email protected] London: +44 207 101 7660

Important disclosures appear at the back of this report GP Bullhound LLP is authorised and regulated by the Financial Conduct Authority and the Prudential Regulation Authority

Table of Contents The Evolution of Business Intelligence and Analytics ................................................................................................. 2 Introduction .................................................................................................................................................................... 2 Defining Big Data ........................................................................................................................................................... 3 First Phase of Analytics: Business Intelligence .............................................................................................................. 3 Comparing BI to Big Data .............................................................................................................................................. 5 Second Phase of Analytics: Big Data comes of age ...................................................................................................... 6 How big is Big? .............................................................................................................................................................. 7 Big Data expected to drive big value .............................................................................................................................. 8 Market Size and Growth ................................................................................................................................................ 11 Analytics Enters a New Era ........................................................................................................................................... 12 Third phase of Analytics: Data-driven business ........................................................................................................... 12 Big Data Infrastructure: Now ready for large scale adoption ........................................................................................ 12 Visualisation: Unlocking Analytics for business users .................................................................................................. 16 Predictive Analytics: Analytics for the future ................................................................................................................ 18 Industry Landscape ....................................................................................................................................................... 22 Investment and Acquisition Dynamics ........................................................................................................................ 24 Investment activity reaching new heights ..................................................................................................................... 24 M&A: Poised for a new wave of consolidation ............................................................................................................. 26 Selected Private Placements ........................................................................................................................................ 28 Selected M&A transactions ........................................................................................................................................... 29 Selected Company Profiles ........................................................................................................................................... 30 Analytics, Visualisation and Big Data ........................................................................................................................... 30 Predictive Analytics ...................................................................................................................................................... 34 Service Providers ......................................................................................................................................................... 36

0

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

THE EVOLUTION OF BUSINESS INTELLIGENCE AND ANALYTICS INTRODUCTION “The march of quantification, made possible by enormous new sources of data, will sweep through academia, business, and government. There is no area that is going to be untouched.” Professor Gary King, Institute for Quantitative Social Science, Harvard University

1

While the concept of analysing data to derive insights in business is nothing intrinsically new, we believe a convergence of new technologies has recently ignited an Analytics revolution. Today’s emerging wave of Analytics technology is widely expected to have a dramatic impact not only on businesses across all industries, but also has the potential to revolutionise healthcare and the significantly impact the public sector. We see several companies ideally poised to execute on this Analytics revolution. ‘Analytics’ as an area of technology encompasses hardware, to physically store data and power computations, software, to intelligently store and analyse data on given hardware, and services, to help users leverage the former two. This report focuses on the software segment of the market; while much of the buzz around Big Data software has been on the solutions for storing data (e.g. Hadoop, NoSQL), we believe that software to analyse Big Data, particularly to give predictive insights in real-time, will become a key area of growth and investment.

The Analytics software market is entering its third phase. The initial phase, running from the 1950s to the mid ‘00s, saw the emergence of Business Intelligence solutions designed to deliver reports based on internal business data. The second phase, starting around 2008, was when Big Data as a concept entered the corporate mind-set, and saw companies predominantly concerned with capturing and analysing more data than ever before. The third phase, starting now, is about driving predictive insights in real-time. EXHIBIT 1 – ANALYTICS: ENTERING ITS THIRD PHASE

1 Lohr S. “The Age of Big Data” New York Times (February 2012) 2

2

Davenport T. “Preparing for Analytics 3.0” The Wall Street Journal (February 2013)

GP Bullhound LLP

2

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

Source: GP Bullhound, Tom Davenport

DEFINING BIG DATA Big Data has now become a ubiquitous term but remains ambiguous and no universally agreed upon definition currently exists. Most definitions refer to the dimensions of data volume, velocity, and variety first 3

outlined in 2001 paper by META Group (now Gartner) analyst Doug Laney. Sufficient magnitude in any one of these dimensions – sheer amount of data; speed with which it arrives; and in particular, data types beyond traditional structured data – can create a Big Data challenge, requiring advanced technology. Companies have been managing large volumes of data for some time so Big Data challenges typically relate to more than just raw data volume.

Analytics is a critical component of Big Data, and our preferred definition was advanced by researchers at the University of St Andrews, who surveyed and distilled the many Big Data definitions to have gained traction into: “The storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

4

Whilst Big Data will continue to mean different things to different people and evolve over time, the fixation on the “volume” aspect of Big Data helps explain why much of the initial hype has focused on the technologies relating to data storage and processing. We are now seeing a major shift in emphasis towards the analysis aspects of Big Data. To understand current trends and the possible future direction in Big Data Analytics, it is helpful to explore a brief history of Business Intelligence and computing as a decision making tool within business.

FIRST PHASE

OF

ANALYTICS: BUSINESS INTELLIGENCE

Analytics have been used in business since the late 19th century when American industrialist Frederick Winslow Taylor began conducting experiments on metal-cutting machinery to improve efficiency at the 5

Midvale Steel Company in Pennsylvania. Henry Ford is said to have carried out time management analysis on the Model T assembly line when it was first produced in 1908. It was not until the middle of the twentieth century and the advent of computing that Analytics began to command more attention.

Decision Support Systems evolved as a field of study from research at the Carnegie Institute of Technology 6

and MIT in the late 1950s and early 1960s. In the mid 1960’s, Scott Morton a Scottish engineering student at Harvard Business School built what is thought to be the first model-driven Decision Support System to help managers make business planning decisions. The research coincided with the development of second generation computing technology in the form of powerful mainframe computers such as the IBM 360 which made it cost-effective and practical to develop Management Information Systems 7

(MIS) for large businesses.

3 4 5 6 7

Laney D. “3-D Data Management: Controlling Data Volume, Velocity and Variety” Meta (2001) Ward J. and Barker A “Undefined By Data: A Survey of Big Data Definitions” (2013) Hounshell D. “The Same Old Principles in the New Manufacturing” Harvard Business Review (November 1988) Power, D.J. “A Brief History of Decision Support Systems” DSSResources.com (March 2007) Ibid.

3 GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

Westinghouse Electric Company recognised the potential of the system to improve profitability and began 8

experimenting with the system to coordinate production planning for laundry equipment. Other pioneering projects in the 1960s included the Semi-Automated Business Research Environment (SABRE), a flight booking and tracking system built by IBM for American Airlines. The system led to the creation of the first modern customer loyalty programme and remains in operation to this day.

Despite advances in computer technology in the 1970s and 1980s, processing power was limited and computer hardware remained expensive. It was common for medium-sized business to operate numerous large mainframe-based application systems with reporting programs that offered limited flexibility and required a computer programmer.

9

EXHIBIT 2 – FIRST PHASE OF ANALYTICS (1960-2008) – ALPHABET SOUP

BI Business Intelligence

EDW ESS DSS

MIS Management Information Systems

60s

Decision Support Systems

70s

Executive Support Systems

80s

Enterprise Data Warehouse

00s

90s

Source: GP Bullhound

By the early 1980s Decision Support Systems, and closely related Executive Support Systems, began to gain recognition as a new class of information system which could support managers across a variety of business functions and at different levels within organisations. Important and related technology evolution during this period included query and reporting tools, a move away from mainframes to client servers and PCs, and the adoption of Online Analytical Processing (OLAP) which empowered users to slice and dice data in meaningful ways. Business Objects and Hyperion were among companies offering data Analytics capabilities in response to demand from business users increasingly seeking direct access to data and user friendly tools for ad hoc queries. It was not until 1989 however that Gartner Analyst Howard Dresner, proposed “Business Intelligence” (BI) as an umbrella term to describe the “concepts and methods to improve business decisionmaking by using fact-based support systems.” It was another decade before the adoption of Business Intelligence became widespread across large companies, giving rise to an industry which has undergone 10

significant consolidation and is estimated by Gartner to be worth $13.8 billion in 2013 .

8 Morton M. “Reflections of Decision Support Pioneers” DSSResources.com (September 2007) 9 Data-warehouses.net 10 Gartner, (February 2013)

4

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

C O M P A R I N G BI

TO

BIG DATA

Analytics is a broad term so it is useful to draw a distinction between the Analytics associated with traditional BI, and the more advanced Big Data Analytics which we see as a fundamentally different approach.

BI systems typically focus on answering the What, Where and When of business performance by providing information based on internally–focused, operational data (e.g. ERP and CRM) in the form of reports and dashboards. A key limitation of traditional BI solutions is that they only allow users to analyse structured data, which significantly limits the amount and kinds of analysis that can be performed. Whilst standard business reports and OLAP-based BI Analytics can be very useful, they are reactive in that they inform users about past performance.

By contrast, Big Data Analytics typically involves the combination of more advanced data mining and machine learning algorithms (e.g. optimisation and predictive modelling), data distributed over a cluster of computing nodes, and Data Visualisation tools which encourage data discovery. Big Data Analytics is forward looking and more concerned with answering the Why and How questions, and indeed revealing questions that were not previously considered relevant. Where BI deals in known unknowns, Big Data Analytics is better placed to reveal unknown unknowns.

Whereas BI requires the data schema (essentially the column and row headings of a table) to be predefined to allow for queries along certain dimensions, Big Data Analytics can be performed on all types of raw data, and schemas are automatically generated as the data is read. Big Data Analytics also involves leaving the data where it resides and bringing the Analytics processing to the data rather than the other way round – this becomes significant with large data volumes, where transporting data between systems can become very expensive. EXHIBIT 3 – BIG DATA VS. BUSINESS INTELLIGENCE

Source: Wikibon, GP Bullhound

We expect that whilst BI and traditional database technologies will remain important components of enterprise IT infrastructure, Big Data technologies will capture a growing proportion of incremental IT spending and become an increasingly strategic focus for organisations across all industries.

5 GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

SECOND PHASE

OF

ANALYTICS: BIG DATA

COMES OF AGE

The explosion of unstructured data (captured from system logs, multimedia files, smart phones, sensors etc.) has exposed the limitations of traditional database technologies and Analytics tools which were designed to handle structured enterprise data.

Companies seeking to gain valuable insights from this

growing torrent of data are increasingly investing in new technologies such as Hadoop which allows large volumes of unstructured data to be stored and analysed at a fraction of the cost of traditional systems. The combination of these new technologies and advanced Analytics capable of providing broader and deeper insights is ushering in an era of Big Data Analytics. EXHIBIT 4 – BIG DATA ANALYTICS – ABLE TO HANDLE UNSTR UCTURED DATA

Source: GP Bullhound

Awareness of the Big Data phenomenon can be traced back to a team of computer scientists at Silicon 11

Graphics (SGI) in the mid-1990s, and the first significant academic references emerged in the late 1990s.

Google released two landmark papers in 2003 and 2004 describing their distributed file system called GFS, and MapReduce, their distributed data processing platform which together underpinned the Google search 12

engine. The concepts described in these papers helped inspire two part-time developers to create a file system and processing framework that would serve as the basis for Hadoop, an open source software framework that changed the economics of large scale data Analytics by bringing massively parallel computing to commodity hardware. Hadoop really took off when one of these developers, Doug Cutting (after whose son’s toy elephant, Hadoop is named), joined Yahoo! in 2006 (For more information on Hadoop, see p.13.)

In April 2008, Hadoop became the fastest system for sorting a terabyte of data, breaking the world record to 13

sort a terabyte of data in just under three and a half minutes using a 910-node cluster.

In the last decade,

Hadoop has evolved from a research project to become a widely adopted open industry standard for distributed data processing which now underpins many of the world’s largest internet businesses.

The term Big Data appeared in The Economist for the first time in early 2010 and McKinsey published an influential report on the subject in May 2011.

14

Google search trends indicate that Big Data gained popularity

at around the same time, and the concept entered the mainstream in 2012, featuring as a topic at the World 15

Economic Forum in Davos that year.

11 Diebold F. “On the Origin(s) and Development of the Term “Big Data” Penn Institute for Economic Research (September 2012) 12 Dean J. and Ghemawat S. “MapReduce: Simplified Data Processing on Large Clusters” Google Inc. (December 2004) 13 O’Malley O. “TeraByte Sort on Apache Hadoop” Yahoo! (May 2008) 14 Manyika, et al., “Big data: The next frontier for innovation, competition, and productivity” McKinsey Global Institute (May 2011) 15 Lohr S. “How Big Data Became so Big" New York Times (August 2012)

6

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

H OW

BIG IS

BIG?

“Every day, 3 times per second, we produce the equivalent amount of data that the Library of Congress has in its entire print collection. Most of it is...irrelevant noise. So unless you have good techniques for filtering and processing the information, you’re going to get into trouble.” Nate Silver, Statistician

At the time of writing, Facebook claims to have the largest Hadoop cluster in the world based on storage capacity, with over 100 PB of storage over 2,000 nodes, growing by roughly half a PB per day.

16

To put this in context, a single petabyte (PB) can store enough mp3 music files to play continuously

for 2,000 years. Every minute, 208,300 photos are uploaded to Facebook, bringing the total number of photos uploaded to the platform to 240 billion. EXHIBIT 5 – A MINUTE IN THE LIFE O F THE INTERNET

208,000 photos uploaded

350,000 tweets

100 hours of video uploaded

120 new accounts

3.5 million search queries

$118,000 in revenue

Source: GP Bullhound

Every day an average of 500 million messages are posted on Twitter and over 1 billion files are uploaded to Dropbox. IBM estimates that we are now creating 2.5 billion gigabytes of data every day; as much as 90% of the data which currently exists, ranging from digital pictures and videos, to social media posts 17

and remote sensor data, was created in the last two years alone.

The digital universe is doubling every two years and IDC estimate that by 2020 the digital universe will be 40,000 exabytes or 40 trillion gigabytes.

18

The vast majority of new data being generated is

unstructured data and IDC estimate that only 0.5% is currently being analysed.

The rapid adoption of smartphones and connected devices is having a major impact on the volume of data traffic crossing the network. Intel estimates that the number of networked devices equalled the world’s population in 2012. This number is expected to reach double the world’s population by 2015, by which point 19

it would take an estimated five years to view all the video content crossing IP networks every second.

The

range of connected devices is beginning to extend beyond phones and tablets to cars and a wide range of connected devices – this phenomenon is termed the ‘internet of things’ and will contribute to the ever growing quantities of data. 16 http://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920 17 IBM Big Data Success Stories (Oct 2011) 18 Gantz J. “The Digital Universe in 2020” IDC (December 2012) 19 Temple K. “What Happens in an Internet Minute”, Intel (March 2012)

7 GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

EXHIBIT 6 – THE SCALE OF BIG DATA (1 EXABYTE = 1 TRILLION MEGABYTES)

40,000

(Exabytes) 2013-2020 CAGR: 45%

30,000 20,000 10,000

Unstructured data 30,000 Exabytes by 2020 Structured data 10,000 Exabytes by 2020

0

Source: GP Bullhound, IDC

BIG DATA

EXPECTED TO DRIV E BIG VALUE

Many industries will be significantly impacted by Big Data Analytics. Although this technology revolution remains in its early stages, we believe that Big Data Analytics has the potential to unleash a wave of productivity and efficiency gains across virtually all industries, and have a significant impact on the economy and society as a whole. In a recent report exploring “game changers” for the US economy, the McKinsey Global Institute (MGI) identified Big Data Analytics as one of the top five catalysts for both economic growth and increasing competitiveness and productivity beyond 2020. The report estimates that the widespread adoption of Big Data Analytics in retail and manufacturing alone could contribute an additional $325 billion to US GDP by 2020, whilst delivering $285 billion in productivity gains in the health care and government sectors.

20

Major retailers have been handling large volumes of data for many years and the industry has been an early adopter of Big Data technologies. In a well-publicised and controversial case from 2012, the US retail group Target successfully used historical transaction data to assign each shopper a “pregnancy prediction” score in order to offer baby product promotions to the right customers at the right time.

21

US retailer Walmart handles over a million customer transactions an hour which are stored in databases estimated to contain more than 2.5 petabytes of data. The company was able to use Big Data Analytics to drive an estimated 10-15% increase in completed online sales equating to over $1 billion in incremental revenue.

22

Dunnhumby, the marketing services firm majority owned by Tesco, analyses data drawn from

over 350 million people including Tesco Clubcard holders, to help the retail giant target promotions more effectively and improve customers' retail and brand experience.

20 Lund S. et al “MGI, Game changers: Five opportunities for US growth and renewal” McKinsey Global Institute (July 2013) 21 Duhigg C. “How Companies Learn Your Secrets” New York Times (February 2012) 22 Romanov A. “Putting a Dollar Value on Big Data Insights” Wired (July 2013)

8

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

EXHIBIT 7 – BIG DATA ANALYTICS IN ACTION

Marketing Analytics

 Optimising campaigns by analysing higher volumes of granular data like clickstream and weather data  Product & Pricing optimisation by analysing large volumes of data to assess likely impact of changes

Customer Analytics

 Online retailers use Hadoop to recommend products and services based on user profile analysis and behavioural Analytics  Big Data technologies optimise the entire customer experience based on insights from analysing data across a variety of channels

Web Analytics

 Machine Learning algorithms are used to identify and rank individuals with most influence for a particular topic  Advanced text Analytics tools analyse unstructured data from sites such as Twitter and Facebook to determine sentiment

Fraud & Risk Analytics

 Businesses are analysing terabytes of data from forums associated with hackers to predict and detect financial fraud and identity theft  Financial institutions analyse large volumes of transaction data to determine exposure of financial assets and score potential customers for risk

Operational Analytics

 Companies are analysing the wealth of information collected by business systems and external sources to optimise business processes  Process modelling and simulation allows companies to assess the impact of complex operational changes before they are implemented

Source: Big Data Partnership, GP Bullhound

Although the travel industry has also had access to large volumes of structured transactional data for many years, online travel companies are now using Big Data Analytics to improve the performance of complex product searches. Edinburgh-based travel search site Skyscanner, which recently received investment from Sequoia Capital, uses Big Data Analytics to process over 1 billion flight prices each day, generating $3.5 billion in flights bookings in last 12 months alone. Recent advances in genomics are highlighting how Big Data Analytics could transform the treatment of cancers by pinpointing critical gene mutations in order to develop more effective and targeted therapies. The cost of sequencing the first human genome was approximately $3 billion and the project took several international institutions, hundreds of researchers and 13 years to complete. Over the last decade, the cost of sequencing a human genome has dropped from close to $100 million dollars in 2001 to just a few thousand dollars and can now be performed in a matter of days.

23

Genome sequencing costs are now

dropping at four times the rate of Moore's law and we are rapidly approaching the $1,000 genome (Exhibit 8).

A key challenge to the development of these more advanced treatments is the need to store and analyse the 24

enormous volumes of sequencing data which is now being created at an astonishing rate . A number of start-ups are developing Big Data platforms to offer scalable and high performance genomic data analysis, including Bina Technologies which recently launched an on-demand solution that can analyse a whole human genome in a little less than four hours. 23 http://www.genome.gov/sequencingcosts/ 24 Resnick R. “Implications of exponential growth of global whole genome sequencing capacity” Genomequest (July 2010)

9 GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

In the public sector, the Open Data Institute founded by Sir Tim Berners-Lee and Artificial Intelligence pioneer Sir Nigel Shadbolt, is promoting innovative uses of Open Data to help address social, environmental and economic challenges. By democratising key data sets (e.g. maps, national surveys), we believe such initiatives can play an important role in reducing barriers to entry and catalysing innovation around Analytics. A recently published report by the McKinsey Global Institute estimates that Open data can help unlock $3.2 per year in economic value across seven “domains” which include education, transportation and 25

healthcare.

EXHIBIT 8 – COST PER GENOME (LOGARITHMIC SCALE )

$100,000,000 $10,000,000 $1,000,000 $100,000 $10,000

$1,000 2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

Source: GP Bullhound, NHGRI

Ultimately, we believe the last 50 years of BI history, and more recently the emergence of the Big Data phenomenon, has been a prelude to the era the world is now entering, where Big Data Analytics will emerge as a critical part of the decision making process for leading businesses across all sectors.

25

10

Manyika J. et al “ Open data: Unlocking innovation and performance with liquid information” McKinsey Global Institute (October 2013)

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

MARKET SIZE AND GROWTH The Big Data market is showing strong growth across software, services and hardware. Industry analysts are forecasting an aggregate compound annual growth rate (CAGR) of between 30-40% over the next 5 years, with the software sub-sector at the high end of that range. Within this growth, the Big Data software sector – including both applications (e.g. visualisation and predictive tools) and infrastructure software (e.g. databases and middleware) – is forecast to quintuple in size by 2017, resulting in a Big Data software 26

market worth $12.2bn in 2017, up from $2.2bn in 2012 . This quantum of $12bn is approximately equal to the entire Business Intelligence software market today, giving a startling sense of how pervasive Big Data software is likely to become in the medium term.

US$ bn

EXHIBIT 9 – BIG DATA – A $50BN MARKET BY 2017

60 48.5

50 44.2

38.7

40

Software 28.3

30

Hardware Services

18.4

20 11.6 10

7.3

0 2011

2012

2013

2014

2015

2016

2017

Source: Wikibon, GP Bullhound

Drilling into the software segment of the Big Data market, we expect that applications (analytical and transactional), as opposed to infrastructure software and databases, will see the highest levels of growth. While applications currently represent 44% of the Big Data software segment, Wikibon expects this share to increase to 60%+ by 2017, growing at a 56% CAGR (2011-2017) to $7.4bn in 2017. EXHIBIT 10 – BIG DATA SOFTW ARE – APPLICATIONS TO BECOM E THE LARGEST SEGMEN T

$1.0bn

(US$bn)

2012 $7.4bn 2017 0

2

Infrastructure Software

4

6

SQL Databases

Source: Wikibon, GP Bullhound

26

Kelly, J, et al “Big Data Market Size and Vendor Revenues” Wikibon (February 2013)

11 GP Bullhound LLP

8

10

NoSQL Databases

12

Applications

14

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

ANALYTICS ENTERS A NEW ERA THIRD

PHASE OF

ANALYTICS: DATA-DRIVEN

BUSINESS

“Organizational judgment is in the midst of a fundamental change—from a reliance on a leader’s ‘gut instinct’ to increasingly data-based Analytics” Erik Brynjolfsson, Director at the MIT Centre for Digital Business

Organisations across a number of industries are increasingly investing in the technologies and capabilities required to leverage the growing volume and variety of data to make better decisions. According to a recent survey by Gartner, 64% of organisations are investing or planning to invest in Big Data technology 27

compared with 58% in 2012.

Although companies have been capturing growing volumes of data for many years, it is only relatively recently that a confluence of trends has made it economically viable to store and analyse large volumes of unstructured data in meaningful ways. These trends include the emergence of innovative storage and processing architectures such as Hadoop, and next generation non-relational and often opensource databases which are commonly referred to as “NoSQL”. Other trends include the exponential increases in computational power, declining costs of memory, and cloud computing (e.g. Amazon Elastic MapReduce). MIT research suggests that companies embedding data-driven decision making within their operations demonstrate higher productivity and Return on Equity than their peers.

28

Whilst much of the early Big Data investments and media focus has been on Big Data storage and processing technologies, attention is now gravitating rapidly toward the analytical software tools and platforms required to drive value creation from Big Data. As Professor Gary King at the Harvard Institute for Quantitative Social Science recently noted: "Big data isn't about the data. It's about Analytics."

B I G D A T A I N F R A S T R U C T U R E : N OW

29

READY FOR LARGE S CALE ADOPTION

A wave of innovative technology is powering the new generation of Analytics solutions. This wave spans software, hardware, and even entire computing paradigms, all of which are now reaching ‘enterprisereadiness’, and hence creating a perfect storm for mainstream adoption. 

Software Innovations: Hadoop; In-memory databases; NoSQL databases



Hardware Innovations: Solid-state drives and DRAM; Multicore CPUs; Fibre optics



Computing Paradigm Shifts: Cloud computing (IaaS/PaaS/SaaS); Open source

27 Kart L. et al “Big Data Adoption in 2013 Shows Substance Behind the Hype “Gartner (Sep 2013) 28 Bryjolfsson E. et al “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” MIT (January 2012) 29 King G “Big Data is Not About the Data!” NEAI (May 14 2013)

12

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

The following table picks out our ‘Top 5 Big Data Technologies’ and summarises what they are, why they are important, and who the key players are globally.

Key Players…

Why it’s important…

What it is…

EXHIBIT 11 – TOP 5 BIG DATA TECHNOLOGIES

Hadoop

Cloud Computing

Open Source

NoSQL

In-memory Databases

Software framework for distributed data storage and computing

Computing paradigm to leverage economies of scale in computing infrastructure

Software development paradigm which advocates cooperation and openness

Software for storing data; a different approach to traditional relational databases

Software for storing data in DRAM rather than on mechanical HDDs

Big Data infrastructure solution: highly scalable, and accommodates unstructured data

Reduces capex requirements for enterprises wanting to analyse Big Data; can scale up/down fast

Pools the brainpower of more developers for the creation and testing of software

Allows for more data types (including ‘unstructured’), and large data volumes

Leverages speed advantage (c100,00x) of DRAM over HDDs – much faster analytics

Cloudera, HortonWorks, MapR, Amazon Web Services

Amazon Web Services, Microsoft Azure, Google, Rackspace, IBM

Apache Software Foundation, Open Source Software Institute

Cassandra, Mongo DB, HBase, Riak, Couchbase, Neo4j

SAP HANA, VoltDB, MemSQL, GridGain, Apache Spark

Source: GP Bullhound

Of all of the innovations listed above, Hadoop is perhaps the most important for the on-going revolution in Big Data Analytics. Drawing inspiration from two research papers published by Google referred to in the previous section, Hadoop’s creators sought to tackle the problem of how to store and analyse large volumes of internet data.

Their solution to the problem of Big Data was to scale out the size of the computer system horizontally using many commodity computers and develop software to harness the combined compute power and storage. Specifically, Hadoop combines a file storage system (‘HDFS’) to store data (including unstructured data) in a distributed fashion across all the machines (or ‘nodes’) in the system (or ‘cluster’), with a method for mapping out the computing operations necessary for conducting data analysis across those machines (‘Map Reduce’). This combination of HDFS and Map Reduce is the backbone of the Hadoop framework – the scalability and resilience of the solution is evidenced by companies like Yahoo!, whose biggest single Hadoop cluster comprises over 4,500 nodes.

We expect Hadoop adoption to increase and total cost of ownership (TCO) to continue to decline. Hadoop is part of the Apache Software Foundation and is hence open source. This means that anyone can contribute to developing the code base and any enterprise can download the code under a free licence to implement Hadoop for their own data. This has also fuelled adoption, since the combination of free software licences and commodity hardware requirements potentially results in significantly lower TCO than

13 GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

proprietary Big Data systems such as HP Vertica or the Oracle Big Data Appliance. Indeed, at the Hadoop Summit in San Jose in June 2013, MapR, a leading Hadoop software and services provider, asserted that 30

Hadoop can be up to 50x cheaper than alternatives for storing large amounts of data . We believe that incumbent vendors are responding, and that the cost per TB benefit of a Hadoop data management system has narrowed to around 10x. EXHIBIT 12 – HADOOP PRICING ADVANTAGE ON A PER TERABYTE BASIS

Cost / Terabyte Hadoop Benefit

Hadoop

Netezza

Exadata

Extreme Data Appliance (1650)

$333

$10,000

$14,000

$16,500*

30x saving

42x saving

50x saving

Source: MapR (June 2013); * Teradata has since launched a new Extreme Data Appliance (1700) at $2,000/TB in October 2013

Despite this advantage, early versions of Hadoop have required PhD-levels of computing knowledge, often necessitating costly new hires or consultants to bring the necessary skill set into the organisation. However, we are already starting to see increased ease of use come through with the latest iterations of Hadoop. For example, Hadoop 2.0, which has just been released, allows for other processing algorithms besides MapReduce, which has to date been the most challenging aspect of Hadoop to develop code for.

Given the immaturity of the Big Data infrastructure market, fragmentation is high. For instance, the Apache 31

wiki site lists 29 different companies offering their own twist on (or ‘distribution’) of Hadoop . Companies like MapR, Cloudera and Hortonworks are focused on developing enterprise-grade Hadoop distributions, with features including data protection, disaster recovery, and support for heterogeneous hardware clustering. In addition, beyond the core Hadoop framework, there is a broad array of NoSQL databases – ranging from Cassandra to Voldemort – offering an alternative means of storing and analysing Big Data, particularly for real-time analysis, for which Hadoop is not designed. We see graph databases (e.g. Neo4j from Swedish company Neo Technologies) as a particularly interesting subset of these, which have the potential to drive new applications based on alternative data relationships. EXHIBIT 13 – A FRAGMENTED SPECTRUM OF NOSQL DATABASES

Source: GP Bullhound 30 http://www.slideshare.net/Hadoop_Summit/rosen-june26-205pmroom212-v3 31 http://wiki.apache.org/Hadoop/Distributions%20and%20Commercial%20Support

14

GP Bullhound LLP

GP B ULLHOUND B IG DATA A NALYTICS – E XTRACTING I NSIGHTS FROM E XABYTES

We expect the Big Data infrastructure technology segment to remain one of the most vibrant and innovative areas in tech over the next three years. Three themes we see as likely are: 1.

Open source infrastructure solutions becoming increasingly ubiquitous 

This standardisation places greater emphasis on the Analytics software as the locus of competitive advantage for enterprises. Leading companies in the Hadoop community such as Cloudera, HortonWorks, and MapR are spearheading improvements in the platform.

2.

SQL arriving on Hadoop: 

Most analytic queries today are written in SQL, a coding language ideally suited to analysing data stored in traditional databases. There are 30+ products and open-source 32

projects underway to bring SQL, or SQL-esque coding to Big Data sets in Hadoop

(e.g.

Hive, Drill, Teradata Aster), but we expect that this technology will mature over the next 18 months, helping enterprises to transition away from the traditional data warehouse world. 3.

Convergence on interactive Big Data Analytics (Exhibit 14): 

There are currently at least three significant projects underway to improve the speed of Hadoop analytical querying to ‘interactive’ (
Loading...

Big Data Analytics EXTRACTING INSIGHTS FROM - GP Bullhound

INDEPENDENT TECHNOLOGY RESEARCH SECTOR UPDATE  NOV 2013  SOFTWARE Big Data Analytics EXTRACTING INSIGHTS FROM EXABYTES Analytics is entering a new ...

2MB Sizes 0 Downloads 0 Views

Recommend Documents

No documents