We are seeing lots of interest in and hype around the topic of “big data” because data volumes are on the rise and strategic thinkers across industries are looking for opportunities to maximize its value. According to McKinsey Global Institute and others, the term 'big data' refers to data sets whose size is beyond the ability of typical database software tools to capture, manage, and process within a tolerable elapsed time. Depending on the industry, this can mean data sets ranging from a few dozen terabytes to multiple petabytes. In addition, the term ‘big data’ is associated not only with the volume of data but also the variety (i.e. the types of data, structured or unstructured etc.) and the ‘velocity’ of data, i.e. the dynamic or changing nature of the data as new data flows into, and old data exits, a system.


During the last two decades, organizations have made significant investments in automating business processes using software applications that generate substantial amounts of data, which must then be manipulated before business professionals can usefully access, explore, and analyze it. This data is in myriad formats and its sheer volume is daunting. Business users are challenged to efficiently access, filter, and analyze the data — and gain insight from it — without using powerful data analytics solutions, which require specialized skills. They need better ways to navigate through the massive amounts of data to find what’s relevant to them, and to get answers to their specific business questions.

 

The growth in adoption of massively parallel processing (MPP) solutions for handling ever larger volumes of data — whether structured or unstructured —  is driving demand for analysis tools to enable business users to derive insights from the data.

 

QlikView takes a two-pronged approach to this challenge:

 

Firstly, QlikView’s approach has always been to understand what it is that business users require from their analysis, rather than to force-feed a solution that might not be appropriate. Providing the appropriate data for the appropriate use case is more valuable to users than providing all the data, all the time. For example, local bank branch managers may want to understand the sales, customer intelligence, and market dynamics in their branch catchment area, rather than for the entire nationwide branch network. With a simple consideration like this, the conversation moves from one of large data to one of relevance. In any organization, the number of people who need to analyze extremely large data volumes is typically relatively small. For example, a retail bank might have thousands of branches, however only about 100 business analysts in a centralized, corporate role. While branch managers only need slices of data that are relevant to their operations, the corporate analysts may need access to much large data volumes. QlikView is designed to accommodate both environments and enables users to focus on the data that is relevant to them and is of the highest value to them and their area of interest. By taking appropriate slices of the data – big or small – QlikView acts as an analytical environment downstream of the data source, to provide business analysts and casual business users alike the insight they need from the data that is most relevant.

 

Secondly QlikView has been addressing, and continues to address, the big data challenge by ensuring that targeted QlikView applications can address the amounts of data that are needed to ensure the relevancy of the application for business users:

 

  • Recent trends in large memory spaces available on standard Intel hardware allow QlikView to handle ever-larger volumes of data.
  • QlikView best practices promote an architecture-led deployment when handing very large data sizes, such as making proper use of distributed servers in a clustered environment; constructing appropriate applications for the intended audience; using sophisticated data reload engines; and using document chaining where necessary to allow aggregated views to be coupled with detail-level views while optimizing hardware resources.       
  • QlikView provides an open data protocol (QVX) via a series of API's for developers to allow them to interface with the API's of Hadoop-based data source providers. QlikView's QVX protocol can be used to connect to Hadoop based systems via two different methods
    • Disk based QVX file extracts from Hadoop  - PUSH
    • “Named pipe” QVX connector for Hadoop – PULL
  • A QVX SDK is available to all 3rd party developers who wish to build custom connectors for any system with an open API.  QlikTech has partnered with DataRoket which  has an ETL tool to connect with Hadoop, in addition they have produced a QVX named pipe connector for QlikView to link directly to their ETL tool

 

In conclusion, the QlikView Business Discovery platform is all about relevance. It’s about putting tools in the hands of business users to enable to them to ask and answer their own streams of questions, without having to go back to IT or business analysts for a new report or a new query every time they come up with a follow-on question.

 

(My colleague, Elif Tutuk, also wrote a blog post entitled 'An App Model Approach to Big Data' that is well worth a read to learn more about the QlikView approach to Big Data')