Skip to main content
Announcements
Get Ready. A New Qlik Learning Experience is Coming February 17! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
nicolaslopedebarrios
Contributor II
Contributor II

Anomaly detection analysing log using Qlik Sense

Hi,

We have a big data table (40M records) which is the log from an app usage, and we'd like to detect some anomalies over a short period of time, such as users with more than 30 queries in 1 minute.

Suppose the table contains these columns:

  • query_id,
  • query_timestamp,
  • username,
  • service
  • query_type
  • query_parameters

and each row corresponds to one query over a service made by a user on a specific timestamp.

How can we achieve that?

Labels (2)
4 Replies
Qrishna
Master
Master

I believe you are looking for some document analyzer app:

https://adminplaybook.qlik-poc.com/docs/tooling/app_metadata_analyzer.html#where-to-get-it

 

note: you may not have all the required stuff, but can help you get started.

Vegar
MVP
MVP

What if you transform your timestamp to a minute level timestamp 

Timestamp(Floor(query_timestamp, 1/(24*60))) as minute_level_timestamp

Then you just need to count the rows per user and minute_level_timestamp to get the query per minute and user calculation 

nicolaslopedebarrios
Contributor II
Contributor II
Author

Interesting... so a row still represents a query, but I would have repeated "minute timestamps" and usernames for some users. Thanks, @Vegar I'll give it a try.

nicolaslopedebarrios
Contributor II
Contributor II
Author

Hello again, 

I finished my anomaly detection page, but I'm not completely satisfied, so I want to share with you what I did. Please share your thoughts, and let me know if I'm doing something wrong, or could have been done differently.

LOAD statement (partial):

Date(Floor(query_timestamp), 'DD/MM/YYYY') as FECHA_CONSULTA,
Timestamp(Floor(query_timestamp, 1/(24*60))) as FECHA_CONSULTA_MINUTO,
Timestamp(Floor(query_timestamp, 1/24)) as FECHA_CONSULTA_HORA,
Date(query_timestamp, 'DD/MM/YYYY HH:MM:SS') as FECHA_HORA_CONSULTA,

But, when displaying these columns in a table, shouldn't the minute part of FECHA_CONSULTA_MINUTO  (3rd column) reflect the same minute as FECHA_HORA_CONSULTA (the first column)?

For the calculations, I've created master measures like this:

Aggr(Count(query_id), FECHA_CONSULTA_MINUTO, username)

to get the number of queries per user per minute.

The result appears to be correct.