Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Multiple DB's - duplicate Apps or Data Reduction

Hi there,

I've been doing a fair bit of research of deployment models when it comes to Qlik Sense, but thought I'd see if anyone had some insight my query direct, rather than scenarios that don't quite fit ours.

In our scenario, we have multiple data sources (SQL DB's 70+) with the same schema but different data.

One of the methods I have been looking at is ti create a 'master' app and duplicate it per DB< providing a dynamic text load for the connection string in which I would specify which database to connect to, or another method to get the same affect (connect to a specific database dependent on a particular setting/file/variable in the app).

The other method is Data Reduction. I don't need to go into much detail here, other than it would take a user attribute such as a Source Database value from the authenticated use and apply that when accessing data in the model. The model itself would have to have multiple connections for every database.

From my point of view, I have a few pros and cons of each:

Duplicating App:

PROS: Completely segregated data, no accidental cross over due to 'missing' data reduction rules, small app sizes

CONS: Additional overhead in duplicating apps (or at least creating a process to do this), not much documentation

Data Reduction:

PROS: Single app, already well documented

CONS: Much larger QVD size could be problematic, accidental cross over due to missing data reduction rules

I was wondering if anyone had any thoughts or insights as to best practice, or past examples or similar scenarios.

Many thanks.

13 Replies
rohitk1609
Master
Master

Hi James,

Well the best possible architecture is 3 tier. what it is , You may have two apps . One is the master app where you do all your scripting and modeling , then create a second app and call your first master app with the help of binary load . and do all the visualizations and  variable creation here.

Now You need to put your master or first app on scheduling or refresh and create another task when your master app get refresh , your secondary or visualization app also refresh with the help of trigger task.


So when your master app start refreshing your user never feels refreshing data is in progress and when it get complete it just take few seconds to get second app get refresh by binary load.


Yes you may put filter before creation of QVD if size is big Or you may ask to DB administrator to create some few which has only that columns which is needful.


Please tell me is the same thing you need or something else ?

Not applicable
Author

Hi Rohit thanks for replying.

I'm researching the method you've detailed today and getting my head around the various load methods and 2/3/4 tier architectures I've seen around on the forums.

Do you have any specific links to documentation that breaks down how I would look at planning this for 70+ data sources? The QVD generator looks promising, but it seems it still requires a large manual element of adding new load scripts when a new data source needs loading, and then manually create a child app as well, unless I've missed something.

Many thanks.

*edit: Just found this: Generic Script for QVD Generator An automatic QVD generator. Currently having a play with this.

rohitk1609
Master
Master

Hi James,

I don't think there is any document like it, Its all about ETL practices. Lets Talk one by one :

I don't think you have 70+ data sources, What you have 70+ data tables. Now what you can do , you must follow QVD generation way , because it will improve your  application reloading performance, as QVD compress data up to 80 to 90 percent.

Now please follow 3 tier architecture as all scripting for ETL at one application and visualization in another.

Avoid Joins , Try to use concatenate. avoid Month start like functions. Its better if you create some few which merges some number of tables into one and then access those views in your Qlik. See as much join are there of course any system takes time to calculate and manipulate.

I Hope above blog helps you or You may tell me what exactly you want to know

Not applicable
Author

Hi Rohit,

Thanks for that.

Just to reiterate, yes there are indeed 70+ data sources. The structure, as originally described is that there are 70+ data sources, all SQL, all the same schema but different data. (small caveat there are 300+ tables in the schema but the data model itself won't include all of those).

I understand what you have said and have found the relevant literature on the concept and have implemented my own 'QVD generator' script that loads data from a list of tables, in this case a small sample of 2 databases and a dozen or so tables in to single QVD files. That part seems ideal as I can handle the data load from source separately from the rest of the work that's to be done by Qlik.

The bit I'm having issues in finding, is being able to have multiple apps that have the same structure but different source data. All the examples I have seen are to have a sales app, HR app etc, with perhaps loading in data from each other instead of duplicating the load process, and that's fine. But in my situation, where we have separate customer information that is entirely separate other than having the same schema, then we almost require a 'duplication' process to simply copy an app that contains all the sheet and charts, but have that connect to each customers now generated QVD files.

The solution would also have to have something that would allow changes to be rolled out across all the customers apps, so that if we were to add a new dimension and add a new chart to enhance functionality, it could then be replicated down to the child apps.

I hope this makes sense. I'm researching this as we speak to find a suitable way to achieve this and I appreciate your interest in helping me with this.

Cheers!

rohitk1609
Master
Master

Hi James,

See 70+ data sources , so you need to create 70 different connections , lets say if you create 70+ different  apps, well usually I never see this type of case where 70 + different data sources.

Lets take one data source at one time , you create a connection and then you get 300+ tables , do you  think all 300+ has useful data ? I don't think so , what happens James , few fact table and dimension has useful data , So try to create views , what views is , it club data in the data source and access it So it will be very useful.

Think once , 300+ table when you club in Qlik, you will use joins , It will be big task to execute and slow performance.

I faced this issue once , So what I did , I requested for  views as I said to DB administrator do most of the calculations and manipulation and the final result I accessed in Qlik.

See the concept of 3 tier is , Lets say your app need 1 minute to reload , so at that time your front end user can't use the GUI , So we create one app for modeling where 1 minute reload performs and second app which user access get refresh in few seconds so user always feels it refresh in none seconds .

Now, 70+ data sources has different client data ? If yes , I don't think its best practice to merge all data into one.

I believe create new app for each and every user and provide access to the concerning user to their app this is the best possible solution.

See Keep in mind no matter how many tables you are club , think always how many time an engine need to execute all the join and functions and do user like if the application not work as fast what they want.

Not applicable
Author

Just to round of this question, I've managed to come up with the following solution to our scenario.

We have those 70+ data sources. We also have a amalgamated database which lists all of these sources, their locations (such as server and db name) and if they are to be bought into Qlik.

Part of our tiered architecture involves a QVD generator, which loops through the above amalgamated database and generates a QVD file per table, per client into a location on the Qlik server.

The next tier is a bit of a two stage process. I have one central app that deals with data load, transformation, loading data from the QVD's but it's QVD source is dynamic and changes based on where the app is published too. Part of this tier is I have used the QRS API to create our own API which for each client will:

  • Create a new stream with the name of the client
  • Duplicate the master app
  • Publish this duplicate to the new stream
  • Reload the data

As the master apps QVD source is dynamic and is based on the name of the stream it's published to, it will go about loading data from those QVD's only.

It also allows us to mass publish and reload at the press of a button when we make changes to the data load or a dashboard.

rohitk1609
Master
Master

Hi James,

Can you please explain your two key point in details:

QVD source is dynamic and changes based on where the app is published too


and


QRS API to create our own API which for each client

Not applicable
Author

Sure,

For the first question: "QVD source is dynamic and changes based on where the app is published too"

The data load code in the Tier 2 master app looks for the QVD's based on the name of the stream it's published to. For example if it's published to the 'ClientA' stream, it will load all the QVD's that start with 'ClientA-<table_or_view_name>.qvd' for example. It uses basic string replacement to load the correct files for that client. This app is also locked down. The layer that loads this was using direct SQL calls during development, now we've just swapped over to these QVD calls and it's working like a charm. The next part I'm working on is performing a binary load from the UI layer so that we can develop the data model and UI at the same time, due to limited time and resources.


The second: "QRS API to create our own API which for each client"

I've created a small web application consisting of calls to the QRS API. What it does is first it displays a list of all our clients with Qlik enabled based on our license with them on a page. This comes from our own database. It then queries the QRS API for a list of streams that match the name of the client. If a stream doesn't exist it gives you the ability to call the QRS endpoint to create it.


This goes on. After you create the stream, another call becomes available that will duplicate the Tier 2 app and publish it to the clients stream. There is also an option after to both duplicate and re-publish, and reload the data. It's our first crack at a semi-automated approach to managing client apps. It also allows us to roll out changes to apps very quickly. As you can imagine, if we have 70 clients all with an App and we update the master, we then need some way of distributing that app back out to each client. This web application allows us to do this very quickly, with a single button press.

rohitk1609
Master
Master

No Doubt You did a great job. I really appreciate it and it shows how sound your knowledge with technology is. Th explanation is perfect but again until anyone not work it only about imagination as what is happening there . I wish I would work on same model once.