0 Replies Latest reply: May 19, 2017 10:09 AM by Evan Kurowski RSS

    What is the optimal Qlik configuration when sourcing from Hadoop environments?

    Evan Kurowski

      Hello Community,


      There's been an even increasing interest & uptick in projects which require sourcing from Hadoop environments.  It is inevitable, it's "big data".







      In Qlik literature, there's a regularly updating set of connectivity options, which list possibilities for Unix sourcing options like Cloudera Hive, Cloudera Impala, Hortonworks Hadoop Hive, etc..QlikView's Data Sources



      But are the new entrants as "easy" & accessible as the premise of setting up a DSN and leveraging ODBC or OLE (was that ever easy? ). Are the Hadoop options refining to become more "out of box" features?



      (As a personal opinion, the voluminous number of product & software names on the Hadoop side seems liberal.  Are there just that many more players in the HUE database space?  Or are these products really distinct in terms of data capture features?  Hopefully no one is starting from some open-source perspective, adding their own twist, and then dropping yet another new product for us to digest in the marketplace.  Data connectivity seems pretty straightforward when we identify an RDMS Oracle vs. SQL server instance.  But the parade of "names per release" adds confusion.  Kinda like when Betamax & VHS were duking it out, everyone was just like hoping VHS would win, so that we could all move to a "standard".)


      So the goal again here is to connect Qlik applications to Unix hosted databases (which no longer categorize as Relational Database Management Systems, but rather receive the moniker "big-data", regardless of their actual data volumes). 

      From app design the premise doesn't change, everything still falls into tables & fields.




      But from a connectvity standpoint, traversing the unix ~> windows chasm opens more complexities.  Is this related to a lack of standardization that ODBC brought to windows, because hadoop is the newer frontier?  Or does it arise from having to navigate cross-Operating system?  How do we make this less murky?   


      My question for Qlik community is what are your thoughts regarding optimal architecture, when chaining applications that have to traverse hadoop to Qlik?  



      Is it best to directly hook Qlik to the hadoop source (and figure out which possibility of connectors and settings are going to enable that)?   Or does it require an intermediate ETL software to facilitate connectivity, data modelling, & data preparation?



      Any opinions or thoughts on the topic welcome! Thanks all!  ~E