Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
dwqlik82
Creator
Creator

MongoDB Server process maxing out CPU

Hi,

Having an issue with our on prem alerting server, it seems to be maxing out CPU usage to the point where alerts arent going out and even connecting via mstsc is an issue (have to ask server team to reboot via vmware).  This only appears to have been happening in the past 7 days, from the one time i've managed to get it when its being slow the MongoDB server process was hammering the CPU at 99% overall utilisation (the MongoDB service itself was pretty much 90-95% of that all the time).  Cant see any new alerts or drastic changes to any apps that alerts are built on.

Anyone else having similar issues?

Thanks,

Dale

Labels (4)
1 Solution

Accepted Solutions
dwqlik82
Creator
Creator
Author

Possibly found a solution or at least a mitigation, noticed that whenever it was hammering there were several queries all mentioning the datahistory collection, as it was essentially unusable at this point i took a chance and removed all documents from datahistory, datahistorypositivevalues,datahistorynegative values and for good measure i also did the same for notificationhistory and notificationresult as they had large amounts of records.  Been keeping an eye on it for just under 24 hours, no issues i can see and alerts are still going out.  Obviously dont recommend unless you had nothing to lose (we were at the point where it was going to be either turned off or completely wiped and redone from scratch so was risking nothing).

View solution in original post

8 Replies
dwqlik82
Creator
Creator
Author

Found the issue, it seems one single alert was causing it, looking at the db.currentOP() output in mongo showed 6 operations just stuck there. Weirdly that alert hadnt been modified in over a month and even when disabled, opening the alert in the front end GUI would cause the issue (going through and using killop() on the offending operations would get it back again without having to restart the mongo service). Also the same issue would happen if i duplicated the alert (presumably this would be a distinct AlertId?), have now recreated the alert from scratch and slowly adding in users and so far it hasnt re-occured but am keeping a mstsc window open and scanning task manager like a hawk...

 

This was an alert sending to mobiles only, not sure if that would make a difference?

ta

Dale

dwqlik82
Creator
Creator
Author

Further digging in mongo, it does appear that mobile alerts are what causes it, disabled an alert with ~15 users some with mobiles registered and some without, that stopped the thrashing of the cpu, another alert with just 2 users (mobile only) is causing the CPU to thrash for 60s-2 mins but then calms down after that.  This appears to be a new issue with the latest release as both alerts have been on the previous November release for over a year with seemingly very few issues. 

giociva
Partner - Creator
Partner - Creator

same issue here upgrading to the new release. will open a support case for that.

dwqlik82
Creator
Creator
Author

Cheers, between this and a separate issue on alerting randomly not picking up the licence from the qlik server (but not telling you anywhere in the front end...) we are pretty much at the point of advising people not to use it and if they are in use not to trust the lack of alerts (so are pretty much pointless).

dwqlik82
Creator
Creator
Author

@giociva  Hi just checking did you get anything back from your support case in the end?

dwqlik82
Creator
Creator
Author

Possibly found a solution or at least a mitigation, noticed that whenever it was hammering there were several queries all mentioning the datahistory collection, as it was essentially unusable at this point i took a chance and removed all documents from datahistory, datahistorypositivevalues,datahistorynegative values and for good measure i also did the same for notificationhistory and notificationresult as they had large amounts of records.  Been keeping an eye on it for just under 24 hours, no issues i can see and alerts are still going out.  Obviously dont recommend unless you had nothing to lose (we were at the point where it was going to be either turned off or completely wiped and redone from scratch so was risking nothing).

SwapneelGolapkar
Contributor III
Contributor III

Hi @dwqlik82  -

We are facing a similar situation - wherein the CPU consumption is observed very high.

It would be great if, you could please guide us on how we can
remove all documents from datahistory, datahistorypositivevalues,datahistorynegative as mentioned in your answer.

 

Kind Regards,

Swapneel

dwqlik82
Creator
Creator
Author

Hi,

I used an IDE that was already installed on the server by a previous developer - Robo 3t (it appears this has now been superseded by Studio 3T from the popup on open) with an already created connection to localhost:27017.  the database for us was called qlikalerting and within that were 34 collections.

I've subsequently set up a scheduled task to execute the following .js file daily to delete anything over 7 days:


use qlikalerting;
db.currentOp();
db.getCollection('datahistory').deleteMany({timestamp:{$lt:new Date()-(1000*60*60*24*7)}});
db.getCollection('datahistorypositivevalues').aggregate([{
    "$lookup": {
        "from":"datahistory",
        "localField": "scanId",
        "foreignField":"scanId",
        "as":"scanCheck"}
        }, {
            "$match":{
                "scanCheck":{
                    "$size":0
                }
            }
        }]).forEach((doc) => {
        db.getCollection("datahistorypositivevalues").remove({ "_id": doc._id });
    });
db.getCollection('datahistorynegativevalues').aggregate([{
    "$lookup": {
        "from":"datahistory",
        "localField": "scanId",
        "foreignField":"scanId",
        "as":"scanCheck"}
        }, {
            "$match":{
                "scanCheck":{
                    "$size":0
                }
            }
        }]).forEach((doc) => {
        db.getCollection("datahistorynegativevalues").remove({ "_id": doc._id });
    });​

executed via a .bat file.

 

It seems to work for us, no idea if its breaking something else but as it was essentially unusable beforehand I didn't see much risk. Obviously treat with caution, make sure you understand what its doing, be ready to roll back to a backup etc. etc.

 

good luck!