Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
The content you are looking for has been archived. View related content below.
Feb 15, 2024 3:02:03 AM
Nov 12, 2021 4:53:07 AM
This Techspert Talks session addresses:
Tip: Download the LogAnalyzer app here: LogAnalysis App: The Qlik Sense app for troubleshooting Qlik Sense Enterprise on Windows logs.
00:00 - Intro
01:22 - Multi-Node Architecture Overview
04:10 - Common Performance Bottlenecks
05:38 - Using iPerf to measure connectivity
09:58 - Performance Monitor Article
10:30 - Setting up Performance Monitor
12:17 - Using Relog to visualize Performance
13:33 - Quick look at Grafana
14:45 - Qlik Scalability Tools
15:23 - Setting up a new scenario
18:26 - Look QSST Analyzer App
19:21 - Optimizing the Repository Service
21:38 - Adjusting the Page File
22:08 - The Sense Admin Playbook
23:10 - Optimizing PostgreSQL
24:29 - Log File Analyzer
27:06 - Summary
27:40 - Q&A: How to evaluate an application?
28:30 - Q&A: How to fix engine performance?
29:25 - Q&A: What about PostgreSQL 9.6 EOL?
30:07 - Q&A: Troubleshooting performance on Azure
31:22 - Q&A: Which nodes consume the most resources?
31:57 - Q&A: How to avoid working set breaches on engine nodes?
34:03 - Q&A: What do QRS log messages mean?
35:45 - Q&A: What about QlikView performance?
36:22 - Closing
Resources:
LogAnalysis App: The Qlik Sense app for troubleshooting Qlik Sense Enterprise on Windows logs
Qlik Help – Deployment examples
Using Windows Performance Monitor
PostgreSQL Fine Tuning starting point
Qlik Sense Shared Storage – Options and Requirements
Qlik Help – Performance and Scalability
Q&A:
Q: Recently I'm facing Qlik Sense proxy servers RAM overload, although there are 4 nodes and each node it is 16 CPUs and 256G. We have done App optimazation, like delete duplicate app, remove old data, remove unused field...but RAM status still not good, what is next to fix the performace issue? Apply more nodes?
A: Depends on what you mean by “RAM status still not good”. Qlik Data Analytics software will allocate and use memory within the limits established and does not release this memory unless the Low Memory Limit has been reached and cache needs cleaning. If RAM consumption remains high but no other effects, your system is working as expected.
Q: Similar to other database, do you think we need to perform finetuning, cleaning up bad records within PostgresQL , e.g.: once per year?
A: Periodic cleanup, especially in a rapidly changing environment, is certainly recommended. A good starting point: set your Deleted Entity Log table cleanup settings to appropriate values, and avoid clean-up tasks kicking in before user morning rampup.
Q: Does QliKView Server perform similarly to Qlik Sense?
A: It uses the same QIX Engine for data processing. There may be performance differences to the extent that QVW Documents and QVF Apps are completely different concepts.
Q: Is there a simple way (better than restarting QS services)to clean the cache, because chache around 90 % slows down QS?
A: It’s not quite as simple. Qlik Data Analytics software (and by extent, your users) benefits from keeping data cached as long as possible. This way, users consume pre-calculated results from memory instead of computing the same results over and over. Active cache clearing is detrimental to performance. High RAM usage is entirely normal, based Memory Limits defined in QMC. You should not expect Qlik Sense (or QlikView) to manage memory like regular software. If work stops, this does not mean memory consumption will go down, we expect to receive and serve more requests so we keep as much cached as possible. Long winded, but I hope this sets better expectations when considering “bad performance” without the full technical context.
Q: How do we know when CPU hits 100% what the culprit is, for example too many concurrent user loading apps/datasets or mutliple apps qvds reloading? can we see that anywhere?
A: We will provide links to the Log Analysis app I demoed during the webinar, this is a great place to start. Set Repository Performance logs to DEBUG for the QRS performance part, start analysing service resource usage trends and get to know your user patterns.
Q: Can there be repository connectivity issues with too many nodes?
A: You can only grow an environment so far before hitting physical limits to communication. As a best practice, with every new node added, a review of QRS Connection Pools and DB connectivity should be reviewed and increased where necessary. The most usual problem here is: you have added more nodes than connections are allowed to DB or Repository Services. This will almost guarantee communication issues.
Q: Does qlik scalability tools measure browser rendering time as well or just works on API layer?
A: Excellent question, it only evaluates at the API call/response level. For results that include browser-side rendering, other tools are required (LoadRunner, complex to set up, expert help needed).
Transcript:
Hello everyone and welcome to the November edition of Techspert Talks. I’m Troy Raney and I’ll be your host for today's session. Today's presentation is Optimizing Performance for Qlik Sense Enterprise with Mario Petre. Mario why don't you tell us a little bit about yourself?
Hi everyone; good to be here with everybody once again. My name is Mario Petre. I’m a Principal Technical Engineer in the Signature Support Team. I’ve been with Qlik over six years now and since the beginning, I’ve focused on Qlik Sense Enterprise backend services, architecture and performance from the very inception of the product. So, there's a lot of historical knowledge that I want to share with you and hopefully it's an interesting springboard to talk about performance.
Great! Today we're going to be talking about how a Qlik Sense site looks from an architectural perspective; what are things that should be measured when talking about performance; what to monitor after going live; how to troubleshoot and we'll certainly highlight plenty of resources and where to find more details at the end of the session. So Mario, we're talking about performance for Qlik Sense Enterprise on Windows; but ultimately, it's software on a machine.
That's right.
So, first we need to understand what Qlik Sense services are and what type of resources they use. Can you show us an overview from what a multi-node deployment looks like?
Sure. We can take a look at how a large Enterprise environment should be set up.
And I see all the services have been split out onto different nodes. Would you run through the acronyms quickly for us?
Yep. On a consumer node this is where your users come into the Hub. They will come in via the Qlik Proxy Service and consume applications via the Qlik Engine Service, that ultimately connects to the central node and everything else via the Qlik Repository Service.
Okay.
The green box is your front-end services. This is what end users tap into to consume data, but what facilitates that in the background is always the Repository Service.
And what's the difference between the consumer nodes on the top and the bottom?
These two nodes have a Proxy Service that balances against their own engines as well as other engines; while the consumer nodes at the bottom are only there for crunching data.
Okay.
And then we can take a look at the backend side of things. Resources are used to the extent that you're doing reloads, you will have an engine there as well as the primary role for the central node, active and failover which is: the Repository Service to coordinate communication between all the rest of the services. You can also have a separate node for development work. And ultimately we also expect the size of an environment to have a dedicated storage solution and a dedicated central Repository Database host either locally managed or in one of the cloud providers like AWS RDS for example.
Between the front-end and back-end services where's the majority of resource consumption, and what resources do they consume?
Most of the resource allocation here is going to go to the Engine Service; and that will consume CPU and RAM to the extent that it's allocated to the machine. And that is done at the QMC level where you set your Working Set Limits. But in the case of the top nodes, the Proxy Service also has a compute cost as it is managing session connectivity between the end user's browser and the Engine Service on that particular server. And the Repository Service is constantly checking the authorization and permissions. So, ultimately front-end servers make use of both front-end and back-end resources. But you also need to think about connectivity. There is the data streaming from storage to the node where it will be consumed and then loading from that into memory. And these are three different groups of resources: you have compute; you have memory, and you have network connectivity. And all three have to be well suited for the task for this environment to work well.
And we're talking about speed and performance like, how fast is a fast network? How can we even measure that?
So, we would start for any Enterprise environment, we would start at a 10 Gb network speed and ultimately, we expect response time of 4 MS between any node and the storage back end.
Okay. So, what are some common bottlenecks and issues that might arise?
All right. So, let's take a look at some at some examples. The Repository Service failing to communicate with rim nodes, with local services. I would immediately try to verify that the Repository Service connection pool and network connectivity is stable and connect. Let's say apps load very very slow for the first time. This is where network speed really comes into play. Another example: the QMC or the Hub takes a very very long time to load. And for that, we would have to look into the communication between the Repository Service and the Database, because that's where we store all of this metadata that we will try to calculate your permissions based on.
And could that also be related to the rules that people have set up and the number of users accessing?
Absolutely. You can hurt user experience by writing complex rules.
What about lag in the app itself?
This is now being consumed by the Engine Service on the consumer node. So, I would immediately try to evaluate resource consumption on that node, primarily CPU. Another great example for is high Page File usage. We prefer memory for working with applications. So, as soon as we try to cache and pull those results again from disk, performance we'll be suffering. And ultimately, the direct connectivity. How good and stable is the network between the end users machine and the Qlik Sense infrastructure? The symptom will be on the end user side, but the root cause is almost always (I mean 99.9% of the time) will be down to some effect in the environment.
So, to get an understanding of how well the machine works and establish that baseline, what can we use?
One simple way to measure this (CPU, RAM, disk network) is this neat little tool called iPerf.
Okay. And what are we looking at here?
This is my central node.
Okay. And iPerf will measure what exactly?
How fast data transfer is between this central node and a client machine or another server.
And where can people find iPerf?
Great question. iPerf.fr
And it's a free utility, right?
Absolutely.
So, I see you've already got it downloaded there.
Right. You will have to download this package, both on the server and the client machine that you want to test between. We'll run this “As Admin.” We call out the command; we specify that we want it to start in “server mode.” This will be listening for connection attempts.
Okay.
We can define the port. I will use the default one. Those ports can be found in Qlik Help.
Okay.
The format for the output in megabyte; and the interval for refresh 5 seconds is perfectly fine. And then, we want as much output as possible.
Okay.
First, we need to run this. There we go. It started listening. Now, I’m going to switch to my client machine.
So, iPerf is now listening on the server machine and you're moving over to the client machine to run iPerf from there?
Right. Now, we've opened a PowerShell window into iPerf on the client machine. Then we call the iPerf command. This time, we're going to tell it to launch in “Client Mode.” We need to specify an IP address for it to connect to.
And that's the IP address of the server machine?
Right. Again, the port; the format so that every output is exactly the same. And here, we want to update every second.
Okay.
And this is a super cool option: if we use the bytes flag, we can specify the size of the data payload. I’m going to go with a 1 Gb file (1024 Mb). You can also define parallel connections. I want 5 for now.
So, that's like 5 different users or parallel streams of activity of 1 Gb each between the server machine and this client machine?
Right. So, we actually want to measure how fast can we acquire data from the Qlik Sense server onto this client machine. We need to reverse the test. So, we can just run this now and see how fast it performs.
Okay. And did the server machine react the same way?
You can see that it produced output on the listening screen. This is where we started. And then it received and it's displaying its own statistics. And if you want to automate this, so that you have a spot check of throughput capacity between these servers, we need to use the log file option. And then we give it a path. So, I’m gonna say call this “iperf_serverside…” And launch it. And now, no output is produced.
Okay.
So, we can switch back to the client machine.
Okay. So, you're performing the exact same test again, just storing everything in a log file.
The test finished.
Okay. So, that can help you compare between what's being sent to what's being received, and see?
Absolutely. You can definitely have results presented in a way that is easy to compare across machines and across time. And initial results gave us a throughput per file of around 43.6, 46, thereabouts megabytes per second.
So, what about for an end user who's experiencing issues? Can you use iPerf to test the connectivity from a user machine on a different network?
Yep. So, in in the background we will have our server; it's running and waiting for connections. And let's run this connection now from the client machine. We will make sure that the IP address is correct; default port; the output format in megabytes; we want it refreshed every second; and we are transferring 1 Gb; and 5 parallel streams in reverse order. Meaning: we are copying from the server to the client machine. And let's run it.
Just seeing those numbers, they seem to be smaller than what we're seeing from the other machine.
Right. Indeed. I have some stuff in between to force it to talk a little slower. But this is one quick way to identify a spotty connection. This is where a baseline becomes gold; being able to demonstrate that your platform is experiencing a problem. And to quantify and to specify what that problem is going to reduce the time that you spend on outages and make you more effective as an admin.
Okay. That was network. How can admins monitor all the other performance aspects of a deployment? What tools are available and what metrics should they be measuring?
Right. That's a great question. The very basic is just Performance Monitor from Windows.
Okay.
The great thing about that is that we provide templates that also include metrics from our services.
Can you walk us through how to set up the Performance Monitor using one of those templates?
Sure thing. So, we're going to switch over first to the central node. So, the first thing that I want to do is create a folder where all of these logs will be stored.
Okay. So, that's a shared folder, good.
And this article is a great place to start. So, we can just download this attachment
So, now it's time to set up a Performance Monitor proper. We need to set up a new Data Collector Set.
Giving it a name.
And create from template. Browse for it, and finish.
Okay. So it’s got the template. That's our new one Qlik Sense Node Monitor, right?
Yep. You'll have multiple servers all writing to the same location. The first thing is to define the name of each individual collector; and you do that here. And you can also provide subdirectory for these connectors, and I suggest to have one per node name. I will call this Central Node.
Everything that comes from this node, yeah.
Correct. You can also select a schedule for when to start these. We have an article on how to make sure that Data Collectors are started when Windows starts. And then a stop condition.
Now, setting up monitors like this; could this actually impact performance negatively?
There is always an overhead to collecting and saving these metrics to a file. But the overhead is negligible.
Okay.
I am happy with how this is defined. Now, this static collector on one of the nodes is already set up. There is an option here that's called Data Manager. What's important here to define is to set a Minimum Free Disk. We could go with 10 Gb, for example; and you can also define a Resource Policy. The important bit is Minimum Free Disk. We want to Delete the Oldest (not the largest) in the Data Collector itself. We should change that directory and make sure that it points to our central location instead of locally; and we'll have to do this for every single node where we set this up.
Okay. So, that's that shared location?
Yep.
And you run the Data Collector there. And it creates a CSV file with all those performance counters. Cool.
So, here we have it now. If we just take a very quick look inside, we'll see a whole bunch of metrics. And if you want to visualize these really really quick, I can show you a quick tip that wasn't on the agenda but since we're here: on Windows, there is a built-in tool called Relog that is specifically designed for reformatting Performance Monitor counters. So, we can use Relog; we'll give it the name of this file; the format will be Binary; the output will be the same, but we'll rename it to BLG; and let's run it.
And now it created a copy in Binary format. Cool thing about this Troy is that: you can just double click on it.
It's already formatted to be a little more readable. Wow! Check that out.
There we go. Another quick tip: since we're here, first thing to do is: select everything and Scale; just to make sure that you're not missing any of the metrics. And this is also a great way to illustrate which service counters and system counters we collect. As you can see, there's quite a few here.
Okay. So, that Performance Monitor is, it's set up; it's running; we can see how it looks; and that is going to run all the time or just when we manually trigger it?
You can definitely configure it to run all the time, and that would be my advice. Its value is really realized as a baseline.
Yeah. Exactly. That was pretty cool seeing how that worked, using all the built-in utilities. And that Relog formatting for the Process Monitor was new to me. Are there any other tools you like to highlight?
Yeah. So, Performance Monitor is built-in. For larger Enterprises that may already be monitoring resources in a centralized way, there's no reason why you shouldn't expect to include the Sense resources into that live monitoring. And this could be done via different solutions out there. A few come to mind like: Grafana, Datadog, Butler SOS, for example from one of our own Qlik luminaries.
Can we take a quick look at Grafana? I’ve heard of that but never seen it.
Sure thing. This is my host monitor sheet. It's nowhere built to a corporate standard, but you can see here I’m looking at resources for the physical host where these VMs are running as well as the domain controller, and the main server where we've been running our CPU tests. And the great part about this is I have historical data as far back I believe as 90 days.
So, this is a cool tool that lets you like take a look at the performance and zoom-in and find the processes that might be causing some peaks or anything you want to investigate?
Right. Exactly. At least come up with a with a narrow time frame for you to look into the other tools and again narrow down the window of your investigation.
Yeah, that could be really helpful. Now I wanted to move on to the Qlik Sense Scalability Tools. Are those available on Qlik community?
That's right. Let me show you where to find them. You can see that we support all current versions including some of the older ones. You will have to go through and download the package and the applications used for analysis afterwards. There is a link over here. So, once the package is downloaded, you will get an installer. And the other cool thing about Scalability Tools is that you can use it to pre-warm the cache on certain applications since Qlik Sense Enterprise doesn't support application pre-loading.
Oh, cool. So, you can throttle up applications into memory like in QlikView. Can we take a look at it?
Yes, absolutely. This is the first thing that you'll see. We'll have to create a new connection. So, I’ll open a simple one that I’ve defined here and we can take a look at what's required just to establish a quick connection to your Qlik Sense site.
Okay, but basically the scenario that you're setting up will simulate activity on a Qlik Sense site to test its performance?
Exactly. You'll need to define your server hostname. This can be any of your proxy nodes in the environment. The virtual proxy prefix. I’ve defined it as Header and authentication method is going to be WebSocket.
Okay.
And then, if we want to look at how virtual users are going to be injected into the system, scroll over here to the user section. Just for this simple test, I’ve set it up for User List where you can define a static list of users like so: User Directory and UserName.
Okay. So, it's going to be taking a look at those 2 users you already predefined and their activity?
Exactly. We need to test the connection to make sure that we can connect to the system. Connection Successful. And then we can proceed with the scenario. This is very simple but let me show you how I got this far. So, the very first thing that we should do is to Open an App.
So, you're dragging away items?
Yep. I’m removing actions from this list. Let's try to change the sheet. A very simple action. And now we have four sheets, and we'll go ahead and select one of them.
Okay, so far, we have Opening the App and immediately changing to a sheet?
Yep. That's right. This will trigger actions in sequence exactly how you define them. It will not take into consideration things like Think Time. I will just define a static weight of 15 seconds, and then you can make selections.
But this is an amazing tool for being able to kind of stress test your system.
It's very very useful and it also provides a huge amount of detail within the results that it produces. One other quick tip: while defining your scenario, use easy to read labels, so that you can identify these in the Results Application. Let's assume that the scenario is defined. We will go ahead and add one last action and that is: to close, to Disconnect the app. We'll call this “OpenApp.” We'll call this “SheetChange.” Make sure you Save. The connection we've tested; we've defined our list of users. First, let's run the scenario. There is one more step to define and that is: to configure an Executor that will use this scenario file to launch a workload against our system. Create a New Sequence.
This is just where all these settings you're defining here are saved?
Correct. This is simply a mapping between the execution job that you're defining and which script scenario should be used. We'll go ahead and grab that. Save it again; and now we can start it. And now in the background if we were to monitor the Qlik Sense environment, we would see some amount of load coming in. We see that we had some kind of issue here: empty ObjectID. Apparently I left something in the script editor; but yeah, you kind of get the idea.
So, all this performance information would then be loaded into an app that is part of the package downloaded from Qlik community. How does that look?
So, here you will see each individual result set, and you can look at multiple-exerciser runs in the single application. Unfortunately, we don't have more than one here to showcase that, but you would see multiple-colored lines. There is metrics for a little bit of everything: your session ramp, your throughput by minute, you can change these.
CPU, RAM. This is great.
Exactly. CPU and RAM. These are these are not connected. We don't have those logs, but you would have them for a setup run on your system. These come from Performance Monitor as well, so you could just use those logs provided that the right template is in place. We see Response Time Distribution by Action, and these are the ones that I’ve asked you to change and name so that they're easy to understand.
Once your deployment is large enough to need to be multi-node and the default settings are no longer the best ones for you, what needs to be adjusted with a Repository Service to keep it from choking or to improve its performance?
That's a great question Troy. So, the first thing that we should take a look at is how the Repository communicates with the backend Database and vice versa. The connection pool for the Repository is always based on core count on the machine. And the best rule of thumb that we have to date is to take your core count on that machine, multiply it by 5, and that will be the max connection pool for the Repository Service for that node.
Can you show us where that connection pool setting can be changed?
Yes. So, we will go ahead and take a look. Here we are on the central node of my environment. You'll have to find your Qlik installation folder. We'll navigate to the Repository folder, Util, QlikSenseUtil, and we'll have to launch this “As Admin.”
Okay.
We'll have to come to the Connection String Editor. Make sure that the path matches. We just have to click on Read so that we get the contents of these files. And the setting that we are about to change is this one.
Okay. So, the maximum number of connections that the Repository can make?
Yes. And this is (again) for each node going towards the Repository Database.
Okay.
Again, this should be a factor of CPU cores multiplied by 5. If 90 is higher than that result, leave 90 in place. Never decrease it.
Okay, that's a good tip.
Right. I change this to 120. I have to Save. What I like to do here is: clear the screen and hit Read again; just to make sure that the changes have been persisted in the file.
Okay.
Once that's done, we can close this. We can restart the environment. We can get out of here.
So, there you adjusted the setting of how many connections this node can make to the QSR. Then assuming we do the same on all nodes, where do we adjust the total number of connections the Repository itself can receive?
That should be a sum of all of the connection strings from all of your nodes plus 110 extra for the central node. By default, here is where you can find that config file: Repository, PostgreSQL, and we'll have to open this one, PostgreSQL. Towards the end of the file…
Just going all the way to the bottom.
Here we have my Max Connections is 300.
Okay. One other setting you mentioned was the Page File and something to be considered. How would we make changes or adjust that setting?
Right. So, this is a Windows level setting that's found in Advanced System Settings; Advanced tab; Performance; and then again Advanced; and here we have Virtual Memory.
Okay.
We have to hit Change. We'll have to leave it at System Managed or understand exactly which values we are choosing and why. If you're not sure, the default should always be System Managed.
Now, I want to know what resources are available for Qlik Sense admins; specifically, what is the Admin Playbook?
It's a great starting place for understanding what duties and responsibilities one should be thinking about when administering a Qlik Sense site.
So, these are a bunch of tools built by Qlik to help analyze your deployment in different ways. I see weekly, monthly, quarterly, yearly, and a lot of different things are available there.
Yeah. So, we can take a look at Task Analysis, for example. The first time you run it, it's going to take about 20 minutes; thereafter about 10. The benefits: it shows you really in depth how to get to the data and then how to tweak the system to work better based on what you have.
Yeah, that's great.
Right? So, not only we put the tools in your hands, but also how to build these tools as you can here. See here, we have instructions on how to come up with these objects from scratch. An absolute must-read for every system admin out there.
Mario, we've talked about optimizing the Qlik Sense Repository Service, but not about Postgres? Do larger Enterprise level deployments affect its performance?
Sure. The thing about Postgres is again: we have to configure it by default for compatibility and not performance. So, it's another component that has to be targeted for optimization.
The detail there that anything over 1 Gb from Postgres might get paged - that sounds like it could certainly impact performance.
Right, because the buffer setting that we have by default is set to 1 Gb; and that means only 1 Gb of physical memory will be allocated to Postgres work. Now, we're talking about the large environment 500 to maybe 5,000 apps. We're talking 1000s of users with about 1000 of them peak concurrency per hour.
So, can we increase that Shared Buffer setting?
Absolutely. And in fact, I want to direct you to a really good article on performance optimization for PostgreSQL. And when we talk about fine-tuning, this article is where I’d like to get started. We talk about certain important factors like the Shared Buffers. So, this is what we define to 1 Gb by default. Their recommendation is to start with 1/4 of physical memory in your system. 1 Gb is definitely not one quarter of the machines out there. So, it needs tweaking.
And again these are settings to be changed on the machine that's hosting the Repository Database, right?
That's correct. That's correct.
Now, is there an app that you're aware of that would be good to kind of look at all these logs and analyze what's going on with the performance?
Absolutely. This is an application that was developed to better understand all of the transactions happening in a particular environment. It reads the log files collected with the Log Collector either via the tool or the QMC itself.
Okay.
It's not built for active monitoring, but rather to enhance troubleshooting.
Sure. So, basically it's good for looking at a short period of time to help troubleshooting?
Right. The Repository itself communicates over APIs between all the nodes and keeps track of all of the activities in the system; and these translate to API calls. If we want to focus on Repository API calls, we can start by looking at transactions.
Okay.
So, this will give us detail about cost. For example, per REST call or API call, we can see which endpoints take the most, duration per user, and this gives you an opportunity to start at a very high level and slowly drill in both in message types and timeframe. Another sheet is the Threads Endpoints and Users; and here you have performance information about how many worker-threads the Repository Service is able to start, what is the Repository CPU consumption, so you can easily identify one. For example, here just by discount, we can see that the preview privileges call for objects is called…
Yeah, a lot.
Over half a million times, right? And represents 73% of the CPU compute cost.
Wow, nice insights.
And then if we look here at the bottom, we can start evaluating time-based patterns and select specific time frames and go into greater detail.
So, I’m assuming this can also show resource consumption as well?
Right. CPU, memory in gigabytes and memory in percent. One neat trick is: to go to the QMC, look at how you've defined your Working Set Limits, and then pre-define reference lines in this chart. So, that it's easier to visualize when those thresholds are close to being reached or breached. And you do that by the add-ons reference lines, and you can define them like this.
That's just to sort of set that to match what's in the QMC?
Exactly.
Makes a powerful visualization. So, you can really map it.
Absolutely. And you can always drill down into specific points in time we can go and check the log details Engine Focus sheet; and this will allow us to browse over time, select things like errors and warnings alone, and then we will have all of the messages that are coming from the log files and what their sources.
Yeah. That's great to have it all kind of collected here in one app, that's great.
Indeed.
To summarize things, we've talked about to understand system performance, a baseline needs to be established. That involves setting up some monitoring. There are lots of options and tools available to do that; and it's really about understanding how the system performs so the measurement and comparisons are possible if things don't perform as expected.
And to begin to optimize as well.
Okay, great. Well now, it's time for Q&A. Please submit your questions through the Q&A panel on the left side of your On24 console. Mario, which question would you like to address first?
We have some great questions already. So, let's see - first one is: how can we evaluate our existing Qlik Sense applications?
This is not something that I’ve covered today, but it's a great question. We have an application on community called App Metadata Analyzer. You can import this into your system and use it to understand the memory footprint of applications and objects within those applications and how they scale inside your system. It will very quickly illustrate if you are shipping applications with extremely large data files (for example) that are almost never used. You can use that as a baseline for both optimizing local applications and also in your efforts to migrating to SaaS, if you feel like you don't want to bother with all of this Performance Monitoring and optimization, you can always choose to use our services and we'll take care of that for you.
Okay, next question.
So, the next question: worker schedulers errors and engine performance. How to fix?
I think I would definitely point you back to this Log Analysis application. Load that time frame where you think something bad happened, and see what kind of insights you can you can get by playing with the data, by exploring the data. And then narrow that search down if you find a specific pattern that seems like the product is misbehaving. Talk to Qlik support. We'll evaluate that with you and determine whether this is a defect or not or if it's just a quirk of how your system is set up. But that Sense Log Analysis app is a great place to start. And going back to the sheet that I showed: Repository and Engine metrics are all collected there. And these come from the performance logs that we already produce from Qlik Sense. You don't need to load any additional performance counters to get those details.
Okay.
All right. So, there is a question here about Postgres 9.6 and the fact that it's soon coming end of life. And I think this is a great moment to talk about this. Qlik Sense client-managed or Qlik Sense Enterprise for Windows supports Postgres 12.5 for new installations since the May release. If you have an existing installation, 9.6 will continue to be used; but there is an article on community on how to in-place upgrade that to 12.5 as a standalone component. So, you don't have to continue using 9.5 if your IT policy is complaining about the fact that it's soon coming to the end of life. As we say, we are aware of this fact; and in fact, we are shipping a new version as of the May 2021 release.
Oh, great.
So, here's an interesting question. If we have Qlik Sense in Azure on a virtual machine, why is the performance so sluggish? How do you fine-tune it? I guess first we need to understand what would you mean by sluggish? But the first thing that I want to point to is: different instance types. So, virtual machines in virtual private cloud providers are optimized for different workloads. And the same is true for AWS, Azure and Google Cloud platform. You will have virtual machines that are optimized for storage; ones that are optimized for compute tasks or application analytics; some that are optimized for memory. Make sure that you've chosen the right instance type and the right level of provisioned iOps for this application. If you feel that your performance is sluggish, start increasing those resources. Go one tier up and reevaluate until you find a an instance type that works for you. If you wish to have these results (let's say beforehand), you will have to consider using the Scalability Tools together with some of your applications against different instance types in Azure to determine which ones work best.
Just to kind of follow up on that question, if we're looking at that multi-node example from Qlik help, what nodes would you consider would require more resources?
Worker nodes in general. And those would be front and back-end.
So, a worker node is something with an engine, right?
Exactly. Something with an engine. It can either be front-facing together with a proxy to serve content, or back-end together with a scheduler a service to perform reload tasks. These will consume all the resources available on a given machine.
Okay.
And this is how the Qlik Sense engine is developed to work. And these resources are almost never released unless there is a reason for it, because us keeping those results cached is what makes the product fast.
Okay.
Oh, here's a great one about avoiding working set breaches on engine nodes. Question says: do you have any tips for avoiding the max memory threshold from the QIX engine? We didn't really cover this this aspect, but as you know the engine allows you to configure memory limits both for the lower and higher memory limit. Understanding how these work; I want to point you back to that QIXs engine white paper. The system will perform certain actions when these thresholds are reached. The first prompt that I have for you in this situation is: understand if these limits are far away from your physical memory limit. By default, Qlik Sense (I believe) uses 70 / 90 as the low and high working sets on a machine. With a lot of RAM, let's say 256 - half a terabyte of RAM, if you leave that low working set limit to 70 percent, that means that by default 30 of your physical RAM will not be used by Qlik Sense. So. always keep in mind that these percentages are based on physical amount of RAM available on the machine, and as soon as you deploy large machines (large: I’m talking 128 Gb and up) you have to redefine these parameters. Raise them up so that you utilize almost all of the resources available on the machine ,and you should be able to visualize that very very easily in the Log Analysis App by going to Engine Load sheet and inserting those reference lines based on where your current working sets are. Of course, the only way really to avoid a working set limit issue is to make sure that you have enough resources. And the system is configured to utilize those resources, so even if you still get them after raising the limit and allowing the - allowing the product to use as much RAM as it can without of course interfering with Windows operations (which is why you should never set these to like 99, 98, 99). Windows needs RAM to operate by itself, and if we let Qlik Sense to take all of it, it will break things. If you've done that and you're still having performance issues, that means you need more resources.
Yeah. It makes sense.
Oh, so here is another interesting question about understanding what certain Qlik Repository Service (QRS) log messages say. There is a question here that says: try to meet the recommendation of network and persistence the network latency should be less than 4 MS, but consistently in our logs we are seeing the QRS security management retrieved privileges in so many milliseconds. Could this be a Repository Service issue or where would you suggest we investigate first? This is an info level message that you are reporting. And it's simply telling you how long it took for the Repository Service to compute the result for that request. That doesn't mean that this is how long it took to talk to the Database and back, or how long it took for the request to reach from client to the server; only how long it took for the Repository Service to look up the metadata look up the security rules and then return a result based on that. And I would say this coming back in 384 milliseconds is rather quick. It depends on how you've defined these security rules. If these security rules are super simple and you are still getting slow responses, we would definitely have to look at resource consumption. But if you want to know how these calls affect resource consumption on the Repository and Postgres side, go back to that Log Analysis App. Raise your Repository performance logs in the QMC to Debug levels so that you get all of the performance information of how long each call took to execute. And try to establish some patterns. See if you have calls that take longer to execute than others; and where are those coming from any specific apps, any specific users? All of these answers come from drilling down into the data via that app that I demoed.
Okay Mario, we have time for one last question.
Right. And I think this is an excellent one to end. We talked a whole bunch here about Qlik Sense, but all of this also applies to QlikView environments. We are always looking at taking a step back and considering all of the resources that are playing in the ecosystem, not just the product itself. And the question asks: is QlikView Server performance similar to how it handles resources Qlik Sense? The answer is: yes. The engine is exactly the same in both products. If you read that white paper, you will understand how it works in both QlikView and Qlik Sense. And the things that you should do to prepare for performance and optimization are exactly the same in both products. Excellent question.
Great. Well, thank you very much Mario!
Oh, it's been my pleasure Troy. That was it for me today. Thank you all for participating. Thank you all for showing up. Thank you Troy for helping me through this very very complicated topic. It's been a blast as always. And to our customers and partners, looking forward to seeing your questions and deeper dives into logs and performance on community.
Okay, great! Thank you everyone! We hope you enjoyed this session. Thank you to Mario for presenting. We appreciate getting experts like Mario to share with us. Here's our legal disclaimer and thank you once again. Have a great rest of your day.
@Troy_Raney - Thanks for the great set of resources to help monitor and optimize QS performance.
Any chance you can share a link to the Log Analysis app mentioned in the video? Would love to deploy it in our environment.
Hello @ozz1k
Here it is! LogAnalysis App: The Qlik Sense app for troubleshooting Qlik Sense Enterprise on Windows logs
All the best,
Sonja
@Sonja_Bauernfeind - this is perfect, thank you!
Great, great content! Very helpful.
I have question about max connection pool QSR. Are we talking about physical or logical core count? It doubles the numbers, so it's crucial to distinguish 😉
A dubt here @Sonja_Bauernfeind:
when you say "That should be a sum of all of the connection strings from all of your nodes plus 110 extra for the central node" you mean that the Central Node amount has to be counted double?
Thanks,
Hello @Giovanni_Civardi
Would this article help? Recommended practice on configuration for Qlik Sense
It includes more information on the max connections as well as how to find your database max pool.
All the best,
Sonja
@Sonja_Bauernfeind How to calculate PostgreSQL's buffer share for a large environment?
Hello @Oaten
For in-depth scaling assistance, Qlik offers our Professional Services to be able to give you a suitable answer for your requirements.
All the best,
Sonja
Hola, muchas gracias por el video.
Cuando iPerf nos entrega la información del ancho de banda, independiente del numero que entrega, como puedo saber si el resultado es optimo o es lento? cual seria un ancho de banda optimo para la entrega de datos entre el core principal y un rim? por ejemplo: sobre 70mb es optimo.
Muchas gracias.