How do you handle "Big Data"? - Qlik Community

Michael_Tarallo · ‎2016-08-02

Greetings Qlik Community. I pleased to introduce you to our newest Guest Blogger, David Freriks. David is a Technology Evangelist on the Innovation and Design team at Qlik. He has been working in the "big data" space for over three years, starting with Hadoop and moving onto Spark in this continuously evolving ecosystem. He has 18+ years in the BI space, helping launch new products to market. David is here today to discuss a few approaches on how Qlik can address...."Big Data".

"Big Data"

The term "Big Data" has been thrown around for several years, and yet it continues to have a very vague definition. In fact, there are no two big data installations and configurations alike – insert snowflake paradigm here. It’s no surprise, given the unique nature of “big data”, it cannot be forced into an abstract model. These type of data systems evolve organically, and morph based on the ever changing business requirements.

If we accept that no two big data systems are alike, how can one deliver analytics from those systems with a singular approach?

Well, we can’t – in fact it would be quite limiting to do so. Why?

Picking one and only one method of analysis prevents the basic question “What problem is the business user trying to solve?” from being answered. So what do I mean by “picking one version of analysis”?

The market breaks it down into the following narrow paths:

Simple SQL on Hadoop/Spark/etc.
Some form of caching of SQL on Hadoop/Spark/etc
ETL into database then analysis

These solutions have their place, but to pick only one greatly limits a user’s ability to succeed, especially when the limits of each solution are reached.

So how does Qlik differentiate itself from the narrow approaches and tools that exist in the market?

Simple answer, variety. Qlik is in a unique position that offers a set of techniques and strategies that allow the widest range of capabilities within a big data ecosystem.

Below are some of the approaches Qlik addresses the big data community with:

In-Memory Analytics: Get the data you need and accelerate it, which provides a great solution for concepts such as data lakes. Qlik creates a “Synch and Drink” strategy for big data. Fast and powerful, but does not retrieve all the data, which might be ok given the requirements. Think of it as a water tower for your data lake. Do you really need 1 petabyte of log data, or maybe just the errors and anomalies over the last 30 days?

Direct/Live Query: Sometimes you do need all the data, or a large set that isn’t realistic to fit into memory, or latency is a concern – then use Qlik in live query mode. The catch with this strategy is you are completely dependent on the source system to provide speed. This scenario is best when an accelerator (Teradata, Jethro, atScale, Impala, etc) is used as a performance booster. Qlik uses our Direct Discovery capability to enable this scenario

On-Demand-App-Generation: This is a “shopping cart” approach that allows users to select from a cart of content curated from the big data system. By guiding the users to make selections this technique reduces the raw volume of data being returned from a system to just what they need, it also allows IT to place controls, security, and limiters in front of those choices so mistakes (trying to return all records from a multi-petabyte system) can be avoided.

API - App on Demand: This is a API evolution of the shopping cart method above but embedded within a process or environment of another interface or mashup. This technique allows Qlik apps to be created temporarily (i.e. session app) or permanently based on the inputs from another starting point. This is an ideal solution for big data partners or OEM’s who would like to build Qlik integration directly into their tool.

In summary, to prevent limited interactions with whatever “big data” system you use, you need options. Qlik is uniquely positioned in this area due to the power of the QIX engine and our ELT + Acceleration + Visualization three-in-one architecture. Since no two big data systems are alike, Qlik offers the most flexibility with solutions in the market to adapt to any data scenario big, or small.

Regards,

David Freriks

Emerging Technology Evangelist

Follow me: David Freriks (@dlfreriks) | Twitter