This demonstration showcases Qlik's ability to manage and consume large data sets in a governed environment using On-Demand Application Generation te...
This demonstration showcases Qlik's ability to manage and consume large data sets in a governed environment using On-Demand Application Generation technology built into Qlik Sense. A user can browse summary information about banking trends and then drill down into the transaction "on-demand" to get the details.The user can only get the details once they have filtered down the amount of accounts to under 100 in this example. The resulting app is the users own personalized app to explore and create new content.
This demo leverages over 16 million IOT sensor and maintenance readings sourced from Kafka and Streamsets to create a Qlik app that allows deep analy...
This demo leverages over 16 million IOT sensor and maintenance readings sourced from Kafka and Streamsets to create a Qlik app that allows deep analytics on well maintenance issues. With this app, there is the ability to drill down to a very granular level to see performance issues related with real world well production issues in Alberta, Canada.
Navigating the analytics labyrinth with integration of Kudu, Impala, and Qlik.
Using Hadoop for Big Data analytics is nothing new, but a new entit...
Navigating the analytics labyrinth with integration of Kudu, Impala, and Qlik.
Using Hadoop for Big Data analytics is nothing new, but a new entity has entered the stale file format conversation with the backing of Cloudera – you might have heard of it, it’s called Kudu.
What is Kudu? Let’s first take a step back and think about the dullest topic in the universe, file system storage formats. Flat files, AVRO, Parquet, ORC, etc have been around for a while and all provide various advantages and strategies for data access optimizations in a HDFS construct. However, they all suffer from the same issue… static data that can only be appended to – unlike a real database.
So, enter Kudu – defined by Apache: “Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer.” Deconstructing that message – Kudu acts as a columnar database that allows real database operations that aren’t possible in HDFS file formats. It is now possible to interact with your Hadoop data where INSERTS, UPDATES, DELETES, ALTERS, etc. are now available as data operations.
This means not just read/write capabilities for Hadoop , but also interactive operations without having to move to Hbase or other systems. IoT use cases, interactive applications, write-back, and traditional data warehousing are now possible without adding layer upon layer of additional technologies.
Understanding what Kudu can do, how does this benefit Qlik? Kudu is fast, columnar, and designed for analytics – but with the ability to manipulate and transform the data to power new use cases.
Let’s start simple by showing how easy it is to move some data from an Impala table on Parquet into Kudu.
Starting in Hue we need to do some basic database-like work. To put data into a table, one needs to first create a table, so we’ll start there.
Kudu uses standard database syntax for the most part, but you’ll notice that Kudu is less specific and rigid about data types than your typical relational database – and that’s awesome. Not sure if your data is a varchar(20), or if it is smaller or larger? With Kudu – you don’t have to care, it’s just declare it as a basic string.
Numerics are basic as well, there a just few types to choose from based on the length of the number. This makes creating columns and designing a schema very, very straightforward and easy to setup. It also reduces data type problems when loading data in.
Understanding the basic syntax of table creation, we will go ahead and create our table we are going to copy from Parquet. It’s worth noting, that there are some differences here versus creating a Parquet table in Hue.
First: A Kudu table needs to have at least 1 primary key to be created.
Second: A Kudu table needs a partition method to distribute those primary keys
Referencing the schema design guide above, we are going to use a HASH partition and use the number 3 (since we have 3 worker nodes).
Summarizing, we have a bunch of strings, a few integers, and some floating decimals for prices and profit. We’ve identified our keys and specified our partitions – let’s roll!
The query runs for a second and viola – we have our new (albeit empty) table. Next, we need some data. We have an existing table that we would like to copy over into Kudu. We will run another query to move the data and make a little tweak on the keys to match our new table.
We had to cast our customer_sk and item_sk columns from string in Parquet to int in Kudu but that’s pretty easy to do as shown in the SQL here.
We run the INSERT query and boom… We have our data moved over into Kudu, and even better – that table is now immediately available to query using Impala!
With the data loaded into Kudu and exposed via Impala – we can now connect to it with Qlik and start building visualizations.
Using the , we start the process of building a Qlik app..
Opening Qlik Sense, we will create a new connection to our cluster and select our new table.
Once we have our data – we’ll build an app to directly query Kudu (versus loading the data into memory) to take advantage of the speed and power of Impala on Kudu. This change is accomplished with a slight alteration in the syntax to identify dimensions and measures.
We now have live queries running against Kudu datasets through Impala.
The great part about Kudu is that we’re just getting started with the possibilities of how we can leverage the technology with Qlik. Some things we’re cooking up for the not too distant future involve write-back with Kafka and Qlik Server Side Extension integration – so stayed tuned.
This demo is entirely a technical demo showing how to use the more advanced special features of Cloudera Impala which is called Complex Types. Comple...
This demo is entirely a technical demo showing how to use the more advanced special features of Cloudera Impala which is called Complex Types. Complex types (also referred to as nested types) let you represent multiple data values within a single row/column position. They differ from the familiar column types such as BIGINT and STRING, known as scalar types or primitive types, which represent a single data value within a given row/column position.
In this demo, Qlik uses our direct query capability to connect to Impala to run interactive queries with a TPC-DS data set stored in parquet format. ...
In this demo, Qlik uses our direct query capability to connect to Impala to run interactive queries with a TPC-DS data set stored in parquet format. What's unique about Qlik is that even though the data is not being stored in memory initially, we still have the associative experience avaiable to the user. This capability executes queries in parallel against the Impala engine to acheive maximum performance.
This app analyzes every US Government contract for 2011 - 2016 fiscal years. It includes over 18.7 million contracts with a total spend of over $2.6 ...
This app analyzes every US Government contract for 2011 - 2016 fiscal years. It includes over 18.7 million contracts with a total spend of over $2.6 trillion. Data was sourced from www.usaspending.gov and has been enriched with geo-spacial data for mapping capabilities. Displays key spending metrics such as total spend, # of contracts, # of vendors, and spend over time
Spark is one of the greatest Big Data advancements to appear on the scene since Hadoop. Qlik in this demo is going to leverage the power of Spark mac...
Spark is one of the greatest Big Data advancements to appear on the scene since Hadoop. Qlik in this demo is going to leverage the power of Spark machine learning to process raw transactional data into "Market Baskets". A Market Basket is a categorization of similar things sold in conjunction with each other, i.e. if I buy Product A, Product B,C and E are often sold with it, but not product D. This application merges the original Point of Sale data with the Spark machine learning processed data in-memory to analyze the Market Baskets.
This demo is based on 20+ data sources that have been loaded into HDFS and then transformed into a pure in-memory Qlik app. The datasets that have be...
This demo is based on 20+ data sources that have been loaded into HDFS and then transformed into a pure in-memory Qlik app. The datasets that have been loaded into Cloudera are from a variety of sources including: CDC, World Health Organization, Twitter, Flight Stats data, Weather data, Texas hospital data, and other clinic sources. This highly visually stunning app showcases Qlik's ability to tell a powerful story with data.
Attunity Replicate for SAP is a high-performance, automated and easy to use data replication solution that is optimized to deliver SAP application da...
Attunity Replicate for SAP is a high-performance, automated and easy to use data replication solution that is optimized to deliver SAP application data in real-time for Big Data analytics. it moves the right SAP application data easily, securely and at scale to any major database, data warehouse or Hadoop, on premises or in the cloud. This solution builds on decades of leadership in enterprise data replication and SAP integration.
This application demonstrates a direct load from SAP ECC into Cloudera. The data is loaded directly from SAP into HDFS and then turned into Impala tables that Qlik connects to and applies complex transforms to in adding business friendly terms and time series analytics capabilities.
The Centers for Medicare and Medicaid Services (CMS) defines Quality Measures as “tools that help us measure or quantify healthcare processes, outcom...
The Centers for Medicare and Medicaid Services (CMS) defines Quality Measures as “tools that help us measure or quantify healthcare processes, outcomes, patient perceptions, and organizational structure and/or systems that are associated with the ability to provide high-quality health care and/or that relate to one or more quality goals for health care. These goals include: effective, safe, efficient, patient-centered, equitable, and timely care.”
This application presents an approach that a health system may want to take to visualize their data so that they can target the right areas for improvement. This data set contains 62.5 million quality records for 2.76 million patients and covers across 8 health systems, with 685 Practice Groups employing 5 thousand physicians.
Cloudera is rich in metadata useful for understanding the data behind the analytics visualized by Qlik. However, this metadata is somewhat scattered ...
Cloudera is rich in metadata useful for understanding the data behind the analytics visualized by Qlik. However, this metadata is somewhat scattered across different areas of the Cloudera ecosystem. In this application, Qlik is pulling valuable metadata from Cloudera Navigator and Cloudera Manager using REST API's. These API's give insight in query usage, query performance, and metadata tags using published API calls.
Combining this REST data with a series of looping SQL calls against Impala, we are able to associate database, table, and column statistics with the Navigator and Manager data. By combining this data, we are able to create a full understand of relevant Cloudera metadata that Qlik can analyze. This application also powers the selection criteria for the upcoming Cloudera Data Explorer.
Cloudera Altus is a cloud service platform with services that enable you to use CDH to analyze and process data at scale within a public cloud infras...
Cloudera Altus is a cloud service platform with services that enable you to use CDH to analyze and process data at scale within a public cloud infrastructure. It is designed to provision clusters quickly and to make it easy for you to build and run your data workloads in the cloud.
Altus works within the cloud service provider architecture. That framework provides an excellent foundation for Qlik Sense in a cloud based solution powered by Altus. This dashboard application is powered by Altus running a TPCDS data set on S3 and Impala as the query engine.
Demonstrates how Qlik, Cloudera and DataRobot can be integrated to provide a modern analytics stack for an anti-money laundering use case. The fict...
Demonstrates how Qlik, Cloudera and DataRobot can be integrated to provide a modern analytics stack for an anti-money laundering use case. The fictional PomBar Bank has just released an international payments system, powered by Ripple. They want to extend visibility of their AML/KYC system into their Ripple transaction data.”
Cloudera's Enterprise Data Hub provides the storage and infrastructure for a secure, governed anti-money laundering system, centralizing data across all legacy banking systems, as well as from Ripple's API. DataRobot - a highly automated platform for machine learning - is used to implement an anomaly detection routine, as part of PomBar's AML workflow. Qlik then provides an efficient end-user platform for monitoring, visualizing, and transforming that data.
This is the "GA" of the Cloudera Data Explorer based off the Data Concierge platform developed by Dennis Jaskowiak and Qlik DACH SA team.
Philip Corr of Bardess Consulting rebuilt the code to enhance user viability and create guardrails for user interaction. The application is powered by the Cloudera Data Catalog developed by Dave Freriks. The data catalog collects and associates metadata from Impala, Cloudera Navigator, and Cloudera Navigator.
This software is released "AS-IS", but welcome improvements to the base code as there are many cool things that could be added to this concept.