A Brief Conversation about Data Governance

 

officer.pngReader: “Woah, woah, hold on a second. Really Mike? - A post on Data Governance? - Don't you represent QlikView!? Shouldn't you be blogging about Business Discovery, Big Data or those sexy Data Visualizations!?”

Mike T: “Easy now, take a moment and breath. <sarcastic>You seem to really know your trendy labels, don't you?</sarcastic> Before we can discover our business, visualize our data or understand if our Big Data's signal-to-noise ratio is even relevant – something more needs to happen. Applications and data are typically prepared from gathered requirements before they are deployed to the masses. However, it is this preparation process that will determine the accuracy, consistency, assurance and overall longevity of the BI solution; aspects commonly overlooked when a proper Data Governance framework is NOT in place.”

Reader: “A proper Data Governance what?!”

Mike T: “Exactly!”

 

Ah-ha!

Now that I’ve gotten your attention I’d like to introduce you to my new series on – yes, Data Governance. Over a series of articles I will introduce you to the concept of Data Governance and the common symptoms and problems that arise from lack thereof. I’ll also include an example where an agency of the US Government could have saved millions annually if a Data Governance framework had been in place. With help from products such as the QlikView Governance Dashboard and QlikView Expressor, I’ll also cover solutions and best practices that can help increase data confidence and reduce risk in the key organizational information used to make decisions.

 

It’s a Problem

Over the course of my career I have seen many organizations quickly adopt a BI solution and jump right into creating reports and dashboards for one or a few specific needs, while giving little thought to the rest of the BI solution and how others may benefit from previous work. So what happens? Another application is then developed with its own requirements, possibly using data and attributes similar to the first. When developed in an independent and ad hoc manner (as with many organization) business models, data definitions and semantics can be stored and defined inconsistently. This causes inaccuracies which only delays decisions as users search for the truth in data. As Enterprises strive to consolidate data and express a need for data repurposing, it becomes critical to introduce Data Governance standards. It’s been established by many analysts that a high percentage of BI projects fail to meet their objectives; siting a variety of issues including failure to implement a centralized data repository, inconsistent data models, little to no metadata management and lack of authority to institute and uphold best practices.

 

Well?

Mike T: So Reader, will you join me in my next post where I will address these challenges and solutions in greater detail? Hopefully, you will see QlikView is much more than just visualizing and analyzing data. It’s about driving decision-making using the right data.

 

Mike Tarallo
Senior Product Marketing Manager
QlikView and QlikView Expressor
Follow me: @mtarallo

Sports play an important role in building a cohesive and inclusive society, capable of uniting people from diverse cultural and religious backgrounds through playing or supporting sport together. Ultimately, we love to cheer on our compatriots and favorite athletes to success, or to see how, by improving their performance, the underdog can come out on top.  That’s why we can understand the widespread excitement and huge following for the Superbowl in the US, the Champions League football final in Europe, the Tour de France in the Alps, and, once every four years, the global games that are the Olympics.

True sports fans know the history of their sport; who the most successful and least successful players or athletes are; which year they were most successful; how many games or matches each participant has won or lost.  The fact is, when you enjoy something, it’s easy to learn about it.  Of course, the same goes for the athletes and their management and sports teams – they know the history.  They know who has been strongest over the years.  They know who made errors, what the competition is likely working on and what equipment is being used.  However, increasingly fans and professionals are pushing their understanding further and learning more through deep data analysis.

Sports enthusiasts and professionals will tell you that they have been analyzing historical data and looking for new opportunities for years.  It’s only with the emergence of new analytic technologies, tools and techniques that it has almost become widespread.  Further still, the statistics around games have become ‘gamified’ themselves – consider fantasy leagues or online or console games, where fans can play at and learn from being a manager trying to create the most successful team via the manipulation of a set of facts and statistics.

Back in the real world – to do great analysis of any sport you first need access to data in a structured and coherent form.  Beyond that you need an intuitive, user driven analysis experience that allows users to explore the data and make discoveries seamlessly. With the PGA and European Tours in full swing, QlikTech has created a Pro Golf App that lets everyday users (and golf enthusiasts) visualize, analyze, compare, and contrast tour data from 2004 through the latest tournament scores this year, as well as World Ranking and FedEx Cup Ranking.  We’ve previously done the same for the 2012 Global Games, the Grand Prix, and many more sporting events.  Thanks to the availability of data, the rise of fast-speed internet and social networks to share insight and the ability to access information on the go with mobile devices, with the Pro Golf App sports enthusiasts can explore the data by year, player, tournament, country, and more and ask questions such as:

- What percent of tournaments played does Rory McIlroy win?

- How many tournaments have South Africans won this year?

- Which German golfer held the No. 1 ranking for just two months?

- Which four countries account for 78% of the major championship wins?

 

golf shot 3 Analytics as a Game in the World of Sports

 

Nothing gets a sports fan going more than when someone disagrees with a fact about their favorite player, team or country, or relays information that they don’t believe.  With the availability of data and the tools to unearth a key fact, statistic, or comparative piece of information, the amount of collaboration and debate around sports analytics has risen hugely in the past few years, there have even been Hollywood movies about it!  The challenge is to understand the best way to present this data to the players, coaches, media, and fans and extend our enjoyment of the games even further through analysis and discovery.

 

(This is a repost of a blog published recently in http://www.itbriefcase.net/)

Richard_Feynman_Nobel.jpgRichard Feynman was one of the greatest physicists of the last century. His work spanned many disciplines and his curiosity drove him to explore and understand a variety of problems in the universe. He was awarded the Nobel Prize in physics in 1965.

 

Feynman, when facing a new problem, used a very simple approach to solve it. He first asked questions and inquired about the details. After that, he retired to think about it, and when he came back he usually had the solution.

 

His ability to find the core of a problem and describe it in a simple, yet precise way, was unmatched. The method is summarized (probably by his friend and fellow physicist Murray Gell-Mann) as “The Feynman Problem Solving Algorithm”:

 

  1. Write down the problem.
  2. Think very hard.
  3. Write down the solution.

 

Intended as a joke, this sounds like a ridiculously simplified work flow for problem solving. One can hardly think that it can serve as an instruction for how to solve a problem.

 

But it can. In fact, it is even a very useful approach. Seriously.

 

It can successfully be used when you build QlikView applications. Then you encounter different problems: Figuring out which data to load, modeling this data and how to write different complex formulae.

 

Using the algorithm, you will find that the hardest part is the first point – to write down the problem. Or rather – to understand the problem in the first place. Point two and three often come automatically if you’ve done the first point properly. Just formulating the problem in precise words will help you understand the problem.  And understanding the problem is the core of all problem solving.

 

The exercise of formulating the problem in words, and explaining it to your users or to your peers, will force you to start thinking, which means that you start working on point two. You may even write the first QlikView scripts to test different concepts, which means that you start working on point three.

 

This only shows that the three points are interconnected and that you will need an iterative approach to get it right. I often start working on all three points in parallel, but all the time I am aware that I need to understand the problem and think hard before I can deliver the final solution.

 

Some methods that I find useful:

  • Listen to your users. They are the best source when it comes to understanding what the application should do; what the goals are. Which KPI:s? Which dimensions? Discuss with them. Ask them questions.
  • In data modeling, you should always ask yourself what each table or record represents. Which field, or combination of fields, uniquely defines a record? Study the data. Understand the data.
  • Visualize your data model. Draw it on a piece of paper, if needed. Name the tables so that you understand what each record represents. Don’t load a table unless you understand what its content is and how it relates to existing tables.
  • Start small: Just one or two KPI:s and few dimensions. Make sure you understand the data model and its calculations before you expand it.
  • A smoker, stuck with a problem, usually takes a break. He stops working and takes a cigarette instead. He starts thinking. Taking a break in order to think, is a very good habit that also non-smokers should adopt. So, once in a while you should walk away from the computer just to think.


Simplicity. Feynman was a genius.

 

HIC

John Sands

Tag Anyone?

Posted by John Sands Jul 11, 2013

The nature of data is changing. At the moment organizations gather data from many different sources including loyalty cards, machine logs, sensor arrays, and social media sentiment analysis (even if they don’t always analyse the data enough)

But what about the future? I recently read a very interesting book ‘Everyware: The Dawning Age of Ubiquitous Computing’ by Adam Greenfield or, as he puts it, “the colonization of our everyday life” by technology. He talks about the many different ways computing will change and spread from discrete devices to existing within the very fabric of everyday life.

 

This is happening quickly: soon clothing with RFID (Radio Frequency Identification) tags that let climate controllers know of your preferences in temperature and humidity will be a reality.  Floors that can monitor foot fall and your presence in the room could have advantages such as for the old and infirm - if they fall the floor can sense it has happened and notify the emergency services. As I said, don’t imagine this is all in the future: right now in Japan they have fitted RFID tags in some items of clothing so that when an elderly person uses a pedestrian crossing it keeps the light red for traffic for a few seconds longer.

RFID technology eliminates the necessity for line of sight scanning as the tag itself contains an antenna that can transmit the information to a receiver

 

Here are some examples of where RFID technology is already being used.

 

 

One of the major limiting factors holding back this type of technology has been the lack of enough ip addresses, but with the arrival of IPV6, the next iteration of the internet, this restriction will be removed.  Now potentially everything in the world could have an IP address. Coupled with the way RFID technology is becoming cheaper and more readily available this is a movement that will not go away.  It’s the arrival of the internet of things.Obviously this may raise ethical and privacy issues - very topical considering the recent news concerning the American National Security Agency and the Prism Project.


Organisations are already struggling with the data they hold now and the phrase “we are data rich and information poor” has never been more correct. We are only going to get more data it’s just a case of how we use it. So prepare yourself and make sure the data you hold works for you and gives you the insight to help you make quality business decisions.

For a Business Discovery platform to meet the expectations of today’s information worker (fast response times, high degrees of interactivity, self-service data exploration and discovery) and scale across an enterprise, it’s now widely accepted that the use of in-memory processing is required.  Here’s a quote from our partner Teradata, which comes from a disk-based heritage: “Naturally for the data which is being used heavily on a day to day basis then there will a more than convincing business case to store this data in-memory to deliver the performance which is required by the business” (http://blogs.teradata.com/anz/are-in-memory-databases-the-answer-or-part-of-the-answer/ )

 

This is no surprise to QlikTech as this is the approach we pioneered 20 years ago, and is now being taken up by pretty much all competing vendors.

 

However, we sometimes come across claims that visualization tools querying direct to disk-based databases are a viable alternative approach.  To suggest that a deployment that only utilizes a dynamic query to disk approach will meet performance expectations is simply not a reality. While some business discovery providers (including QlikView via the Direct Discovery capability) can directly query sources such as Teradata, it’s important to acknowledge that direct query alone is a) much slower and b) utilizes network traffic in an unbounded fashion. Whilst a direct query capability such as Direct Discovery is a very valuable ‘relief valve’ for access to very large data sets, ALL data discovery providers (including QlikView) recommend the use of a performance optimization layer.  In fact, this is one of the defining characteristics of data discovery software according to Gartner (data discovery is their term for Business Discovery):

 

”Data discovery tools are an increasingly prominent class of BI offering that provide three attributes:

1. A proprietary data structure to store and model data gathered from disparate sources, which minimizes the reliance on predefined drill paths and dimensional hierarchies.

2. A built-in performance layer that obviates the need for aggregates, summaries or pre-calculations.

3. An intuitive interface enabling users to explore data without much training.”*

 

The reality is that any BI system meant to satisfy business users has to replicate some or all of the data to deliver acceptable performance. Different vendors take different approaches; QlikView uses its associative in-memory engine (which offers up to 90% compression of source data), other vendors use less intelligent in-memory caches, but all the same, they still replicate data. For 20 years QlikTech has developed an in-memory approach that provides a unique high-performance, associative, intelligent data store. In addition we have developed tooling that very effectively manages the data to allow QlikView deployments to scale to many thousands of concurrent users. Any vendor claiming to deliver genuinely useable, fast discovery without recourse to some data replication in memory (or the use of an in-memory database further down the stack – still a rarity) is misguided.  

 

Related content: QlikView Scalability White Paper. QlikView Architecture and Systems Resource Usage Technical Brief

*Source: Gartner ‘The Rise of Data Discovery Tools’, 26 September 2008, ID:G00161601

A New York Times article from a few months ago really startled me.  The headline was “U.S. to Be World’s Top Oil Producer in 5 Years, Report Says.” This contradicted everything I had known about the slowdown in American oil production.  So what was the game-changer?

There are several components of the sudden shift in the world’s energy supply, but the prime mover is a resurgence of oil and gas production in the United States, particularly the unlocking of new reserves of oil and gas found in shale rock. The widespread adoption of techniques like hydraulic fracturing and horizontal drilling has made those reserves much more accessible.

Until recently, my concept of oil drilling was like this diagram:oilwell.jpg

Find a pocket of oil buried underground, put an oil well on top of it, drill until you hit that oil pocket, and pump until the well runs dry. But most of the untapped oil in the world is not in nice neat reservoirs. It is trapped inside rock formations that are spread over great distances horizontally. To unlock this reserve, two key technologies are required:

  • hydraulic fracturing or “fracking”: creating fractures in rocks and injecting fluids to force the cracks open and release trapped oil and gas
  • horizontal drilling: the ability to drill horizontally, thus be able to follow the natural direction of the oil and gas deposits

HydroFrac3.svg.jpg

What does this have to do with QlikView? Well, we don’t know who first coined the phrase “Data is the new oil” but it is clear that organizations of all kinds are scrambling to unlock the value of their data. Unfortunately, they are still using old technology that can only drill down into data and unlock small bits of value at a time. This is not only true of traditional BI techniques of creating data cubes (thus limiting users to summary views of the data and preventing them from finding the insights hidden in the details) and predefining drill paths (thus limiting users to a narrow line of thought through a report). This is also true of many of the new generation BI tools that promise “ad hoc query” and “multidimensional drill paths” with snazzy visualizations to boot, but hide the fact that underneath the glamour is the same old SQL query on a single data source.

You may already know about QlikView’s distinctive associative experience, which shows with every click what data across the entire data model is associated and what is not. However, what is often less well known among those looking for a truly intuitive and agile BI platform is the fact that QlikView allows users to have that experience across different datasets simultaneously.

A brilliant example of this was recently shared by a customer speaker at a recent QlikView Technology Summit. Larry Griffiths, BI Manager at Bentley Systems, Inc., works at a software company with a broad portfolio of products catering to the infrastructure engineering industry. Their primary data source was SAP BW. With their existing BI tool, they struggled to enable simple tasks like pipeline and contract reviews for sales managers. They also had other data sources, such as log data that tracked actual usage of their numerous software products. Trying to analyze all that information together was difficult.  After spending a year and half searching for a solution, they selected QlikView.  In the words of the speaker, the breakthrough came when with QlikView, “…within half a day, on a little 4GB VMWare instance, we had pulled in 200 million rows of contract data and usage data.  A problem we haven’t been able to solve for a long time was solved.”

After they deployed QlikView, users from across the company found many ways to extract real value from data, via the ability of QlikView to drill horizontally and ‘frack’ the data for its valuable insights.  For example, the product development group made use of an affinity market basket analysis app, which answers questions such as “what products are used most with what other products?” By doing so, they saved over $2.5 million in software development costs in the last year by dropping unnecessary software integration projects.

They are experiencing amazing discoveries by tying multiple data sources together. The latest effort is combining revenue data from SAP BW with usage data from the application server with training data from their learning management system (LMS). What insights could they extract with this? For one, they will be able to correlate the training level of their user base with higher usage of their software and increased revenue. This gives them actionable insight across the entire company to improve customer training programs, upsell training seats, and increase revenue via better bundling of relevant software and training.

To visualize this with the drilling analogy, other BI tools can only deliver this siloed view of data, which not only keeps data fragmented, but leaves users frustrated and resorting to using Excel to get at the insights they need, which defeats the purpose of the investment in BI. QlikView gives them a holistic view, which enables them to extract maximum value from data.  Isn’t that what we all want?

comparison.jpg

Click here to watch a recorded webinar by QlikView and Bentley Systems.

 

Do you have a story of “horizontal drilling” with QlikView?  Please share it below!

Filter Blog

By date:
By tag: