Big Data to Big Value - Qlik [PDF]

massively parallel computing clusters to solve major challenges in academia, government, and the private sector. ... whi

37 downloads 17 Views 540KB Size

Recommend Stories


cPPP BIG Data Value
We may have all come on different ships, but we're in the same boat now. M.L.King

Big Boss? Big Data!
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

Big data, Big Brother?
Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

PDF Big Data
What we think, what we become. Buddha

PDF Big Data
Those who bring sunshine to the lives of others cannot keep it from themselves. J. M. Barrie

Big Data and Big Cities
Come let us be friends for once. Let us make life easy on us. Let us be loved ones and lovers. The earth

big data
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

Big Data
Don't count the days, make the days count. Muhammad Ali

Big Data
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Big Data
Learning never exhausts the mind. Leonardo da Vinci

Idea Transcript


White Paper

Big Data to Big Value How Qlik can help you gain value from your Big Data September, 2017

qlik.com

White Paper

Table of Contents Executive Summary

3

Introduction

3

The growing need for Big Data analytics

4

How Big Data flows from source to analysis

5

Utilizing Big Data: Focus on relevance and context

6

Different methods for different data volumes and complexities

7

Comparison of different Big Data Access methods

11

Qlik and Big Data connectivity

11

Qlik goes the last mile with Big Data

12

qlik.com

Executive Summary • Big Data’s promised benefits are not realized until there is a way for business users to easily analyze data. • The key to unlocking value lies in presenting only what is relevant and contextual to the problem at hand. • Different data volumes and complexities are best met using different methods or a combination of methods. • Qlik offers multiple methods and best practices to give customers a significant advantage in time-to-insight when it comes to analyzing Big Data.

Introduction There continues to be an incredible amount of interest in the topic of Big Data. It has transcended from a trend to being simply part of the current IT lexicon. For some organizations, its use has already become an operational reality; providing unprecedented ability to store and analyze large volumes of disparate data that are critical to the organization’s competitive success. It has enabled people to identify new opportunities and solve problems they haven’t been able to solve before. For other organizations, Big Data is still something that needs to be better understood in terms of its relevance to a company’s current and future business needs. This paper reviews how data flows from source to analysis and then discusses how the Qlik data analytics platform can help companies gain the most leverage from a Big Data implementation by easing access and making Big Data both relevant and in-context for the organization’s business users.

Big Data to Big Value | 3

The growing need for Big Data analytics Historically, the uses of Big Data focused on Data Scientists running very complex algorithms on massively parallel computing clusters to solve major challenges in academia, government, and the private sector. While the need for Data Scientists to solve such complex problems still exists, there is a much broader need for end users to be able to harness the power of Big Data analytics for a variety of business issues. And unlike the algorithmic model which seeks to find the needle in the haystack by mining through all the data available, business users are more likely to ask ad hoc questions that focus on various slices of the data that relate to them. They want to gain new insights to better answer actionable business issues such as: • How have my product sales performed since we ran the last promotion? • How effectively is my sales team cross-selling our products? • Which of my products are NOT selling well? Does this vary by region or sales team? • Is there a lack of redundancy anywhere within my plant’s supply chain? What happens if a natural disaster cuts off our primary suppliers? • Does the service call history for my region indicate any pattern of customer satisfaction or dissatisfaction? These types of questions have been posed by business users long before the advent of Big Data, but such questions weren’t answered with a high degree of certainty or granularity because key data sets didn’t exist or were impractical to access. Business users were unable to combine their intuition with better data to arrive at more optimal decisions. Now, however, the technology exists to expand the availability of Big Data sources to business users. Qlik provides both the rapid, flexible analytics on the front end as well as the ability to integrate data from multiple sources (e.g., Hadoop repositories, data warehouses, departmental databases, and spreadsheets) in one single, interactive analytics layer.

Big Data to Big Value | 4

How Big Data flows from source to analysis To make an analogy from metal mining, raw ore must be extracted from the earth, transported to plants which use mechanical and chemical processes to refine the metal, and only then can it be fashioned into jewelry or other products. Likewise, data follows a journey from its raw form to delivering business insight: • Gather. The origin of business-oriented Big Data is typically machine or IoT data (e.g., data streams, server logs, and RFID logs), transaction data (e.g., website activity, point of sale data from physical stores), and cloud data (e.g., stock ticker prices, social media feeds). This data is often unstructured (strings of text or images) or semi- structured (log data with a timestamp, IP address, and other details). In the common definition of Big Data, this sort of data has high volume (terabytes to petabytes), high velocity (many terabytes of new data per day), and high variety (hundreds of different types of servers and applications each creating information in their own format). • Initial processing. If cost of storage is the primary concern, the data is often copied to a Hadoop cluster. The Hadoop Distributed File System (HDFS) is an example of a distributed, scalable, and portable file system designed to run on commodity hardware. Hadoop jobs such as MapReduce enable highly parallel data manipulation and aggregation, but this is typically only sufficient as a first-level interpretation of the raw data. Accelerator tools such as Apache Drill, Spark and Cloudera Impala provide open source means for external systems, such as Qlik, to better query the data stored in Hadoop. • Refinement. Quite often, organizations will also employ an enterprise data warehouse (EDW) which serves as the central repository for structured data that require analysis. EDWs are designed for not just storage volume but also have robust ETL (extract, transform, load) capabilities hence they play a complementary role with Hadoop clusters. EDWs can extract data directly from the data source, a SAN (storage area network) or NAS (network attached storage) system, or Hadoop clusters. Because data in EDWs is structured and not raw, it is easier to query and represent a higher level of meaning than raw data. • Analyze. The typical business user needs the flexibility to integrate data from multiple sources and be immune from the details about where the data comes from or how it is organized. Data modeling must be fast and easily span different data sources. Such an environment not only reduces the burden on IT to keep up with business demands, but it also empower business users to incorporate additional data in their analysis as needed in a timely manner.

Big Data to Big Value | 5

Utilizing Big Data: Focus on relevance and context Business users are constantly being challenged to efficiently access, filter, and analyze data - and gain insight from it - without using data analytics solutions that require specialized skills. They need better, easier ways to navigate through the massive amounts of data to find what’s relevant to them, and to get answers to their specific business questions so they can make better and quicker decisions. Qlik is seeing a few common misconceptions about how Big Data fits into the overall analysis needs of the business user. It is important to understand that: • The most important data may not be in the Big Data repository. Often, the data from the Big Data repository acts as supporting evidence for a discovery initially made in operational data or even in a spreadsheet. For example, a spreadsheet or small database containing customer satisfaction survey results may be the basis for an analytic inquiry, and the data from a Big Data repository allows the user to correlate a customer’s customer service or support history with their satisfaction scores. • The data needed for analysis may be scattered in multiple repositories. The process of configuring an enterprise data warehouse may not only involve copying data from an operational data source but also include metadata modeling and transformations. Because this could be time consuming or cost prohibitive, some operational sources may continue to be separate. They don’t warrant the cost and effort of loading it into the data warehouse. Two important aspects to consider when working with Big Data are determining the relevance and context of the information. Relevance: the right information to the right person at the right time Qlik’s approach has always been to understand what business users require from their analysis, rather than to force feed a solution that might not be appropriate. Access to appropriate data at the right time is more valuable to users than access to all the data, all the time. For example, local bank branch managers may want to understand the sales, customer intelligence, and market dynamics in their branch catchment area, not the entire nationwide branch network. With a simple consideration like this, the conversation moves from one of large data volumes to one of relevance and value.

King.com uses Qlik for Big Data analytics “Implementing Qlik has cost less than 20% of the alternative solutions. The payback period was just a few months.” – Mats-Olov Eriksson, Main architect of the analytics system

Background: • Worldwide leader in casual social games • Offers 150 games in 14 languages • 40 million monthly players • 2 billion rows of new log data per day Use case: • Analyze ROI of marketing campaigns • Track update of new game offers Technology: • Logs stored in 14-node Hadoop cluster • Batch processing create KPIs and aggregates in Hive • Qlik connects via ODBC (open database connectivity) to Hive

Big Data to Big Value | 6

Context: what does the Big Data mean in context of other sources of insight? Qlik’s patented, innovative Associative Engine is designed specifically for interactive, free-form exploration and analysis so data is naturally surrounded with context. Qlik’s associative experience means that every piece of data is dynamically associated with every other piece of data, across all data sources. Qlik also offers powerful on-the-fly calculation and aggregation that instantly updates all analytics and highlights all associations based on user interactions. For example, a Sales by Region chart may be surrounded by related visualizations such as a Sales by Product chart or interactive list boxes that contain contextual information such as date, location, customer, sales history, etc. Any time the user selects within one chart or list box, every other list box and chart is instantly updated based on the user’s selections. This unique capability of Qlik makes it incredibly easy for a business user to focus on (for example) a particular product in a particular geography sold to a particular customer and see only the data that is relevant to them. The usefulness of these associations is even more apparent where there might be hundreds or thousands of products, customers, geographies, etc. Extremely large datasets can be sliced with a few clicks rather than scrolling through thousands of items. With Qlik, context and relevance go hand in hand and quickly take what seemed to be a Big Data problem down to something that is quite manageable without any programming or advanced visualization skills.

Different methods for different data volumes and complexities Because Big Data is a relative term and the use cases and infrastructure in every organization are different, Qlik offers multiple techniques to handle Big Data scenarios: • In-memory • Segmentation • Chaining

Multiple techniques to handle Big Data

• On Demand App Generation • Other methods In some cases, one method may be sufficient. Other scenarios may dictate the use of multiple methods working together. Every situation is different. We put the power in the hands of our customer to decide how they will best manage the inherent tradeoffs between flexibility, user performance and the typical Big Data characteristics of data volume, variety, and velocity. This section reviews the different Qlik methods that can be utilized in Big Data scenarios.

Big Data to Big Value | 7

In-memory Because the Qlik Associative Engine optimizes in-memory speed, compressing data down to 10% of its original size, many Qlik customers find that the inherent capabilities of the product satisfy their Big Data requirements while preserving high performance. In addition, the amount of memory on standard computer hardware continues to grow in size and decrease in price. This has enabled Qlik to handle ever-larger volumes of data in memory. For example, a single 512GB server can handle uncompressed data sets near 4TB in size. Qlik’s compression scheme means that the more redundancy in the data values, the greater the compression. Unlike technologies that simply “support” multi- processor hardware, Qlik is optimized to take full advantage of all the power of multiprocessor hardware. It efficiently distributes the number-crunching calculations across all available processor cores, thereby maximizing performance and the hardware investment. In a clustered environment, Qlik apps can be hosted on different servers. For example, an app containing a smaller amount of aggregated data could be run on a server with less memory while an app with large amounts of detailed data could be configured to run on a larger server, all of this being invisible to the user.

In-memory Data Flow

In addition, Qlik can be deployed such that one server runs in the background extracting and transforming large amounts of data while another server runs the user-facing app; free from the added burden of handling back-end tasks. An additional benefit to IT with this multitiered architecture is that the transactional data source only has to be accessed once. That data can then be reused in multiple Qlik apps without a fresh extract. Administrators can also configure Qlik to load only data that is new or has changed since the last load, thus greatly reducing the bandwidth required from any data source.

Big Data to Big Value | 8

Segmentation Segmentation is the process of dividing up one Qlik application into multiple applications to optimize performance, security, scalability, simplicity and maintenance. Data can be segmented by region or department. Or a user may want to segment a small dashboard or summary app from another app that contains the detailed data. For example, a retail company may have a very large set of data and want to expose analytics (and more importantly insights) in the application to the retail analysts across departments as well as executives and a few power analysts that do the bulk of the detailed analytics. Segmentation will allow us to “break up” the large set of data that would resides together in the application to chunks that serve those different groups. If this is done, each of these groups would be able to utilize their app without incurring the full cost of RAM and CPU required for the full version of the application. Note that segmentation requires very little maintenance or overhead to manage the segmented versions. Chaining Chaining refers to the linking (or jumping) from one Qlik application to another and maintaining some sense of “state” or selections that the user had made prior to linking. While these are separate Qlik apps, even potentially running on different servers, they can share selection states. For example, a CRM application includes several different customer subject areas. Each of the subject areas correspond to a department within the company. Qlik can be configured to have a dashboard and comprehensive app of the overall customer base. These apps are then linked or chained to subject-area apps that are specific to each department. Thus, chaining is another method that allows the customer to manage apps that would contain too much data for their hardware to handle as one giant app.

Segmentation & Chaining Data Flow

It is important to note that the techniques of segmentation and chaining can also be utilized together by segmenting apart multifaceted data views into subject-specific views and then chaining these separate views to each other.

Big Data to Big Value | 9

On-demand App Generation On Demand App Generation (ODAG) is a method that empowers the user to automatically create a purpose-built analysis app every time they select a slice of a very large data source. The vast majority of users don’t want to analyze the entire Big Data source and many times they don’t initially know which “slice” of data they want to analyze in more detail. Thus, what’s desired is a method to quickly scan the entire data source for potentially interesting sections that warrant a more detailed analysis. In some cases, this need could be met by using the concepts of chaining and segmentation - a summary app would be chained to other apps that each contain a segment of the data source in more detail. But what if there are too many potential segments to pre-define as apps? What if the user doesn’t know what parts of the database they want to analyze? Freeform data discovery means that the user can explore in any direction. And that could mean a new app is needed every time an unexplored area is encountered.

On Demand App Generation Data Flow

On Demand App Generation can thus be very valuable in scenarios where the user may not know exactly what part of the database they want to analyze in detail. On-demand App Generation typically consists of two different apps - initially the user is given a selection app where they would pick from a “shopping list” of particular subsets of data such as a Time Period, Customer Segment or Geography. This selection can then be used to trigger the immediate generation of a purpose-built analysis app that only contains detailed data related to the selection. The user is then free to explore the selected detailed data in any direction using the in-memory capabilities of Qlik. Since these apps are governed by the standard Qlik Sense security rules, one can control who can access the detailed data vs summary information. Users now have the freedom to “fail fast” – easily investigating different slices of the data source without the need to develop a new app each time they want to analyze a different set of data. This also allows the administrator to give users broad access to a data source of immense size since only the requested slice of detailed data is actually being managed in-memory at any one time. Other methods There are other techniques one could utilize to access Big Data. There are a variety of partner technologies and tools available that could be integrated with the Qlik Platform. Once could also develop a custom analytic app using JavaScript and the same APIs that the On-Demand App Generation apps utilize in the background. Just like the standard ODAG extension that come with Qlik Sense, user selections would spawn the generation of a filtered data set for analysis via multiple API’s in Qlik Sense or QMS API/EDX in QlikView. Developing such customer apps will likely require greater technical skills, but it removes any limitations imposed by standard Qlik functionality. For example, one could develop a single UI experience that contains both the selection and analysis apps. Big Data to Big Value | 10

Comparison of different Big Data Access methods Just as there is not one method to manage Big Data, there is not one best method to access and analyze Big Data sources. Customers should consider their specific user requirements and data sources to decide which method or combination of methods make the most sense for them.

In-Memory

Segmentation & Chaining

On Demand App Generation

Other methods

Highly compresses data into memory. Methods for data load can extend this even further.

Users move between multiple related segmented apps (e.g. by region).

User selections spawn generation of a filtered data set and purposebuilt app for analysis

• User selections spawn generation of a filtered data set for analysis via multiple API’s in Qlik Sense or QMS API/EDX in QlikView • Partner technology • Other custom solution

• The compressed data source fits into server memory. • Only aggregated or summary data is needed • Only record- level detail over a limited time period is needed.

A data source that is too unwieldly to be managed in server memory and can be split into predefined segments

A data source that is too unwieldly to be managed in server memory and cannot be split into predefined segments

• Custom UI • Technology used to access Big Data sources requires custom development

100s millions to billions

100s millions to billions of rows per segmented app

Billions of rows

Billions of rows

Description

Applicable situation(s)

Data Volumes

Qlik and Big Data connectivity Qlik is designed as an open platform and comes with a number of built-in and third-party connectivity options for Big Data repositories. • ODBC Connectivity. Qlik’s out-of-the-box ODBC connectivity includes drivers for Apache Hive, Cloudera Impala and other software. Additional Big Data tools can be accessed using the Vendor’s ODBC Connector. For example, Micro Focus provides an ODBC driver to Vertica, their Big Data analytics platform. • Data-source specific connectivity. Qlik has partnered with multiple vendors to be certified on the vendor provided ODBC driver. For example, MapR has certified us for Apache Drill and we received SAP certification for their HANA ODBC driver. • Partner-developed connectivity. A number of Qlik partners have developed connectors that are designed to work with specific data sources or applications where Qlik does not already offer connectivity. This growing list of partner-developed connectors can be found at market.qlik.com.

Big Data to Big Value | 11

Qlik goes the last mile with Big Data One of the big challenges in telecom is the “last mile” — bringing the telephone, cable, or Internet service to its end point in the home. It is expensive for the service provider to fan out the network from the trunk or backbone – to roll out trucks, dig trenches, and install lines. As a result, in some cases telecom providers pass high installation costs down to the customer — or neglect to go the last mile at all. There is a “last mile” problem in Big Data, too. Today, most technology providers working on the problems of Big Data are focused on processing the data — they are focused on the backbone, to use the telecom analogy (or the plant, in the ore mining analogy). But the last mile is where Qlik is focused. Qlik’s mission is to simplify decisions for everyone, everywhere, by empowering them to see the whole story that lives within their data. Qlik already does Big Data and it does it well. Many customers have successfully used Qlik to increase the value of their investment in Big Data technology by ensuring that it isn’t restricted to only the few data scientists. Instead, Qlik empowers every user to access and collaborate on Big Data information in combination with traditional data sources and then uses the powerful Qlik associative experience to gain new insight.

Big Data to Big Value | 12

150 N. Radnor Chester Road Suite E120 Radnor, PA 19087 Phone: +1 (888) 828-9768 Fax: +1 (610) 975-5987

qlik.com

Big Data to Big Value | 13

© 2017 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting™, Qlik Connectors™, Qlik GeoAnalytics®, and the QlikTech logos are trademarks of QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.