Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
nicolaslopedebarrios
Contributor II
Contributor II

Anomaly detection analysing log using Qlik Sense

Hi,

We have a big data table (40M records) which is the log from an app usage, and we'd like to detect some anomalies over a short period of time, such as users with more than 30 queries in 1 minute.

Suppose the table contains these columns:

  • query_id,
  • query_timestamp,
  • username,
  • service
  • query_type
  • query_parameters

and each row corresponds to one query over a service made by a user on a specific timestamp.

How can we achieve that?

Labels (2)
4 Replies
Qrishna
Master
Master

I believe you are looking for some document analyzer app:

https://adminplaybook.qlik-poc.com/docs/tooling/app_metadata_analyzer.html#where-to-get-it

 

note: you may not have all the required stuff, but can help you get started.

Vegar
MVP
MVP

What if you transform your timestamp to a minute level timestamp 

Timestamp(Floor(query_timestamp, 1/(24*60))) as minute_level_timestamp

Then you just need to count the rows per user and minute_level_timestamp to get the query per minute and user calculation 

nicolaslopedebarrios
Contributor II
Contributor II
Author

Interesting... so a row still represents a query, but I would have repeated "minute timestamps" and usernames for some users. Thanks, @Vegar I'll give it a try.

nicolaslopedebarrios
Contributor II
Contributor II
Author

Hello again, 

I finished my anomaly detection page, but I'm not completely satisfied, so I want to share with you what I did. Please share your thoughts, and let me know if I'm doing something wrong, or could have been done differently.

LOAD statement (partial):

Date(Floor(query_timestamp), 'DD/MM/YYYY') as FECHA_CONSULTA,
Timestamp(Floor(query_timestamp, 1/(24*60))) as FECHA_CONSULTA_MINUTO,
Timestamp(Floor(query_timestamp, 1/24)) as FECHA_CONSULTA_HORA,
Date(query_timestamp, 'DD/MM/YYYY HH:MM:SS') as FECHA_HORA_CONSULTA,

But, when displaying these columns in a table, shouldn't the minute part of FECHA_CONSULTA_MINUTO  (3rd column) reflect the same minute as FECHA_HORA_CONSULTA (the first column)?

For the calculations, I've created master measures like this:

Aggr(Count(query_id), FECHA_CONSULTA_MINUTO, username)

to get the number of queries per user per minute.

The result appears to be correct.