Blog - Qlik and Solr Search: Focusing the Lens of Unstructured Analytics
In case you haven’t seen it – there is a super powerful unstructured search engine in the big data ecosystem called Solr. What’s great about Solr is that it can index just about anything, text, xml, JSON, PDF, Word, Excel, or pretty much any kind of text based data. That means you can drop just about anything in Solr and have it searched by the Lucene core (that powers the Solr interface.
So, where does Qlik fit in you ask? Well, let’s observe what a Solr query output looks like:
Hmmm, not very user friendly, not to mention it was somewhat slow.
A little bit about what we’re looking at for these examples: This data is the collective set of Enron emails from its infamous collapse in early 2000’s. We’ve loaded this data set into our Cloudera cluster and indexed it using Solr.
Once this data was loaded and indexed we tested with a series of queries… A full query on someone with a lot of references such as Ken Lay can run upwards of 15 minutes to bring back every email that contains a reference to him.
Imagine 10’s or 100’s of users each waiting 10-15 minutes for a single question to be answered, it clearly dilutes the effectiveness of the engine as a business tool.
Qlik has a tremendously powerful REST connector that is perfectly suited for connecting to sources such as Solr. (A great video on the Qlik REST connector can be found here: https://www.youtube.com/watch?v=FqwNU_pnFt4).
Qlik In-Memory Analytics with Solr
Armed with the REST connector, and a few connection parameters… We can pull the entire Enron email dataset into the Qlik engine via Solr.
By pulling the entire data set, we now ensure that all users have sub-second access to all the data down to the most granular level, and thanks to our associative search technology – all the data has been indexed and correlated in-memory. We can gain further insights by incorporating stock market data. Combining Enron’s stock performance with their emails tells an interesting story of rising email volume along with collapsing stock prices and elevating trade volumes.
Using a mix of visualization techniques, we can see a pretty interesting collection of data, including the famous “deleted emails” gap on the bottom right chart.
Performing some additional analysis, we can drill in on the height of the crash that also correlates with the spike in email volume, followed by a rapid drop in volume.
Making a few more selections we can dive down into a specific name, or comment to filter down the result sets further.
This associative search allows us to dive down into the details of the “TO” elements of the data set and see the metrics affiliated with those names.
We can also jump over to the final sheet on the app and look at the individual emails body content filtered by our prior selections made in the application.
QIX API Powered Solr Search:
The above approach of using Qlik in-memory to front end the Solr search engine is just one of the many ways Qlik can access unstructured data in big data systems.
Let’s consider another application also using Qlik with Solr – this time with just the Qlik API’s. As a quick refresher, the Qlik engine (called QIX) is a fully API enabled engine with tremendous extensibility that allows Qlik to plug into any web based technology (like Solr). Using the awesome QlikSocial framework from the esteemed Johannes Sunden (https://github.com/johsund/QlikSocial), he adapted the webapp to connect to Solr on demand and build a full webapp from scratch.
We start with a search box… And our name(s) of interest:
Now unlike the formatted Qlik Sense app, when a user hits the “search” bar – everything will happen dynamically on the fly using the API’s.
Qlik will dynamically generate a REST connection to Solr, create and load the requesting data into memory, and then build a web app around the data using bootstrap.js and angular.
The webapp is still using the Qlik engine, so selections and the search engine are still available – but all the charts and graphics are html and d3js charts – not Qlik. We’re just powering the app and the data interactivity with the QIX engine!
Solr is an extremely powerful unstructured search engine that can benefit from the speed and structure Qlik analytics can provide as a focusing lens on the core Solr search technology. That data can be consumed in a number of formats including a completely structured Qlik Sense app, or as an API powered web application without any Qlik UI components.