13 Replies Latest reply: Oct 31, 2016 4:28 AM by James Randell RSS

    Multiple DB's - duplicate Apps or Data Reduction

    James Randell

      Hi there,

       

      I've been doing a fair bit of research of deployment models when it comes to Qlik Sense, but thought I'd see if anyone had some insight my query direct, rather than scenarios that don't quite fit ours.

       

      In our scenario, we have multiple data sources (SQL DB's 70+) with the same schema but different data.

      One of the methods I have been looking at is ti create a 'master' app and duplicate it per DB< providing a dynamic text load for the connection string in which I would specify which database to connect to, or another method to get the same affect (connect to a specific database dependent on a particular setting/file/variable in the app).

       

      The other method is Data Reduction. I don't need to go into much detail here, other than it would take a user attribute such as a Source Database value from the authenticated use and apply that when accessing data in the model. The model itself would have to have multiple connections for every database.

       

      From my point of view, I have a few pros and cons of each:

      Duplicating App:

      PROS: Completely segregated data, no accidental cross over due to 'missing' data reduction rules, small app sizes

      CONS: Additional overhead in duplicating apps (or at least creating a process to do this), not much documentation

       

      Data Reduction:

      PROS: Single app, already well documented

      CONS: Much larger QVD size could be problematic, accidental cross over due to missing data reduction rules

       

      I was wondering if anyone had any thoughts or insights as to best practice, or past examples or similar scenarios.

       

      Many thanks.

        • Re: Multiple DB's - duplicate Apps or Data Reduction
          Rohit Kumar

          Hi James,

           

          Well the best possible architecture is 3 tier. what it is , You may have two apps . One is the master app where you do all your scripting and modeling , then create a second app and call your first master app with the help of binary load . and do all the visualizations and  variable creation here.

           

          Now You need to put your master or first app on scheduling or refresh and create another task when your master app get refresh , your secondary or visualization app also refresh with the help of trigger task.


          So when your master app start refreshing your user never feels refreshing data is in progress and when it get complete it just take few seconds to get second app get refresh by binary load.


          Yes you may put filter before creation of QVD if size is big Or you may ask to DB administrator to create some few which has only that columns which is needful.


          Please tell me is the same thing you need or something else ?

            • Re: Multiple DB's - duplicate Apps or Data Reduction
              James Randell

              Hi Rohit thanks for replying.

               

              I'm researching the method you've detailed today and getting my head around the various load methods and 2/3/4 tier architectures I've seen around on the forums.

               

              Do you have any specific links to documentation that breaks down how I would look at planning this for 70+ data sources? The QVD generator looks promising, but it seems it still requires a large manual element of adding new load scripts when a new data source needs loading, and then manually create a child app as well, unless I've missed something.

               

              Many thanks.

               

              *edit: Just found this: Generic Script for QVD Generator An automatic QVD generator. Currently having a play with this.

                • Re: Multiple DB's - duplicate Apps or Data Reduction
                  Rohit Kumar

                  Hi James,

                  I don't think there is any document like it, Its all about ETL practices. Lets Talk one by one :

                  I don't think you have 70+ data sources, What you have 70+ data tables. Now what you can do , you must follow QVD generation way , because it will improve your  application reloading performance, as QVD compress data up to 80 to 90 percent.

                   

                  Now please follow 3 tier architecture as all scripting for ETL at one application and visualization in another.

                   

                  Avoid Joins , Try to use concatenate. avoid Month start like functions. Its better if you create some few which merges some number of tables into one and then access those views in your Qlik. See as much join are there of course any system takes time to calculate and manipulate.

                   

                  I Hope above blog helps you or You may tell me what exactly you want to know

                    • Re: Multiple DB's - duplicate Apps or Data Reduction
                      James Randell

                      Hi Rohit,

                       

                      Thanks for that.

                       

                      Just to reiterate, yes there are indeed 70+ data sources. The structure, as originally described is that there are 70+ data sources, all SQL, all the same schema but different data. (small caveat there are 300+ tables in the schema but the data model itself won't include all of those).

                       

                      I understand what you have said and have found the relevant literature on the concept and have implemented my own 'QVD generator' script that loads data from a list of tables, in this case a small sample of 2 databases and a dozen or so tables in to single QVD files. That part seems ideal as I can handle the data load from source separately from the rest of the work that's to be done by Qlik.

                       

                      The bit I'm having issues in finding, is being able to have multiple apps that have the same structure but different source data. All the examples I have seen are to have a sales app, HR app etc, with perhaps loading in data from each other instead of duplicating the load process, and that's fine. But in my situation, where we have separate customer information that is entirely separate other than having the same schema, then we almost require a 'duplication' process to simply copy an app that contains all the sheet and charts, but have that connect to each customers now generated QVD files.

                       

                      The solution would also have to have something that would allow changes to be rolled out across all the customers apps, so that if we were to add a new dimension and add a new chart to enhance functionality, it could then be replicated down to the child apps.

                       

                      I hope this makes sense. I'm researching this as we speak to find a suitable way to achieve this and I appreciate your interest in helping me with this.

                       

                      Cheers!

                        • Re: Multiple DB's - duplicate Apps or Data Reduction
                          Rohit Kumar

                          Hi James,

                          See 70+ data sources , so you need to create 70 different connections , lets say if you create 70+ different  apps, well usually I never see this type of case where 70 + different data sources.

                           

                          Lets take one data source at one time , you create a connection and then you get 300+ tables , do you  think all 300+ has useful data ? I don't think so , what happens James , few fact table and dimension has useful data , So try to create views , what views is , it club data in the data source and access it So it will be very useful.

                           

                          Think once , 300+ table when you club in Qlik, you will use joins , It will be big task to execute and slow performance.

                           

                          I faced this issue once , So what I did , I requested for  views as I said to DB administrator do most of the calculations and manipulation and the final result I accessed in Qlik.

                           

                          See the concept of 3 tier is , Lets say your app need 1 minute to reload , so at that time your front end user can't use the GUI , So we create one app for modeling where 1 minute reload performs and second app which user access get refresh in few seconds so user always feels it refresh in none seconds .

                           

                          Now, 70+ data sources has different client data ? If yes , I don't think its best practice to merge all data into one.

                           

                          I believe create new app for each and every user and provide access to the concerning user to their app this is the best possible solution.

                           

                          See Keep in mind no matter how many tables you are club , think always how many time an engine need to execute all the join and functions and do user like if the application not work as fast what they want.

                  • Re: Multiple DB's - duplicate Apps or Data Reduction
                    James Randell

                    Just to round of this question, I've managed to come up with the following solution to our scenario.

                     

                    We have those 70+ data sources. We also have a amalgamated database which lists all of these sources, their locations (such as server and db name) and if they are to be bought into Qlik.

                     

                    Part of our tiered architecture involves a QVD generator, which loops through the above amalgamated database and generates a QVD file per table, per client into a location on the Qlik server.

                     

                    The next tier is a bit of a two stage process. I have one central app that deals with data load, transformation, loading data from the QVD's but it's QVD source is dynamic and changes based on where the app is published too. Part of this tier is I have used the QRS API to create our own API which for each client will:

                     

                    • Create a new stream with the name of the client
                    • Duplicate the master app
                    • Publish this duplicate to the new stream
                    • Reload the data

                     

                    As the master apps QVD source is dynamic and is based on the name of the stream it's published to, it will go about loading data from those QVD's only.

                     

                    It also allows us to mass publish and reload at the press of a button when we make changes to the data load or a dashboard.

                      • Re: Multiple DB's - duplicate Apps or Data Reduction
                        Rohit Kumar

                        Hi James,

                        Can you please explain your two key point in details:

                        QVD source is dynamic and changes based on where the app is published too


                        and


                        QRS API to create our own API which for each client

                          • Re: Multiple DB's - duplicate Apps or Data Reduction
                            James Randell

                            Sure,

                             

                            For the first question: "QVD source is dynamic and changes based on where the app is published too"

                            The data load code in the Tier 2 master app looks for the QVD's based on the name of the stream it's published to. For example if it's published to the 'ClientA' stream, it will load all the QVD's that start with 'ClientA-<table_or_view_name>.qvd' for example. It uses basic string replacement to load the correct files for that client. This app is also locked down. The layer that loads this was using direct SQL calls during development, now we've just swapped over to these QVD calls and it's working like a charm. The next part I'm working on is performing a binary load from the UI layer so that we can develop the data model and UI at the same time, due to limited time and resources.


                            The second: "QRS API to create our own API which for each client"

                            I've created a small web application consisting of calls to the QRS API. What it does is first it displays a list of all our clients with Qlik enabled based on our license with them on a page. This comes from our own database. It then queries the QRS API for a list of streams that match the name of the client. If a stream doesn't exist it gives you the ability to call the QRS endpoint to create it.


                            This goes on. After you create the stream, another call becomes available that will duplicate the Tier 2 app and publish it to the clients stream. There is also an option after to both duplicate and re-publish, and reload the data. It's our first crack at a semi-automated approach to managing client apps. It also allows us to roll out changes to apps very quickly. As you can imagine, if we have 70 clients all with an App and we update the master, we then need some way of distributing that app back out to each client. This web application allows us to do this very quickly, with a single button press.

                        • Re: Multiple DB's - duplicate Apps or Data Reduction
                          Bill Markham

                          What I do in that scenario is create a table [using a database, xls, csv, txt... whatever you fancy] and in that have one row for each data source with its name and data connection.

                           

                          Then in the script I loop round for each row running the load script and creating a discrete QVD for each data source.  Then for the dashboards I load the QVD's I want, or more often all of them with a wild card load.

                            • Re: Multiple DB's - duplicate Apps or Data Reduction
                              Rohit Kumar

                              Hi bill ,

                              It will be great help if u show me the sample code or script so I can understand easily , I am confused with how to implement your logic and the other loop logic by scripting.

                              • Re: Multiple DB's - duplicate Apps or Data Reduction
                                James Randell

                                Yeah, that's the first tier - the QVD generator. I actually found a script here (Generic Script for QVD Generator) that I amended for my own needs.

                                 

                                My version first queries our customer table, but still has an inline load for the list of tables. The main bulk of the script is 3 loops:

                                 

                                FOR iServerCount=1 to FieldValueCount('servername')

                                  LET vServerName = FieldValue('servername',$(iServerCount));

                                  LIB CONNECT TO '$(vServerName)';

                                 

                                  FOR iDatabaseCount=1 to FieldValueCount('databasename')

                                    LET vDatabaseName = FieldValue('databasename',$(iDatabaseCount));

                                 

                                    FOR iTableCount=1 to FieldValueCount('TABLENAME')

                                      LET vTableName = FieldValue('TABLENAME',$(iTableCount));

                                 

                                      $(vTableName):

                                      SELECT * FROM [$(vDatabaseName)].[dbo].[$(vTableName)];

                                      STORE $(vTableName) into [lib://$(vQVDPath)/$(vWebaddress).$(vTableName).qvd] (qvd);

                                 

                                    NEXT iTableCount

                                  NEXT iDatabaseCount

                                NEXT iServerCount;

                                 

                                I've omitted a few bits that are specific to us, but this covers the logic part. First loop through the servers and connect to it (I created a connection per server with the same name). Then loop through the databases of those clients on that server. Then finally loop through the inline list of tables to generate a QVD per table. As i said, I took the Generic Script for QVD Generator and tweaked it for my needs.

                                 

                                Hope this helps.