Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello Everyone,
I would like to analyze data but I'm not really sure how to start this analysis.
I'd like to count the number of "Start" in column "Type" only if the session x contains at least 1 type "Start", 1 type "Action" and 1 type "End".
Session_ID | Type |
---|---|
1 | Start |
1 | End |
2 | Start |
2 | Action |
2 | Action |
2 | End |
3 | Start |
3 | Action |
3 | End |
Typically, in the example hereabove, i would get 2 as session 1 is not considered as valid.
Here is where I can't get my mind to it. I think the usage of group by doing the import is not good as I would also need the sessions not to be grouped (so I would have to make several imports). Now, if I'd like to use aggr, I would need to merge all "Types" rows for a session and check for the string "Start, Action* End" (but I don't really know how to do.
What do you guys think would be the best option? Also, how can I merge the "type" rows to do what I want?
Thank you for your help.
Jonathan
You can also try this
=Count(DISTINCT {<Session_ID = P({<Type = {'End'}>})*P({<Type = {'Start'}>})*P({<Type = {'Action'}>})>} Session_ID)
Maybe something like
=Sum( Aggr( If(Concat(DISTINCT Type,'|') ='Action|End|Start',1,0), Session_ID))
Though this is not checking if the Types are appearing exactely once (you can remove the DISTINCT to do this).
Well, there is a long but simpler way on script side: first calculate distinct type count per session id (must be something like LOAD Session_ID, COUNT(DISTINCT Type) AS Type_Count FROM [Your Table] GROUP BY Session_ID), then you can select which one has 3 count (in this case, it is where Type_Count column equals to 3).
You can also try this
=Count(DISTINCT {<Session_ID = P({<Type = {'End'}>})*P({<Type = {'Start'}>})*P({<Type = {'Action'}>})>} Session_ID)
Hello,
Thank you for your answer.
The thing is, I don't always have 3 counts and 3 counts are not necessarily good (Start, Error, End for instance).
Is it possible to add the distinction of type (text) in the load?
Thanks,
Jonathan
Hello Stefan,
Is this valid if I have 1 action or more like using joker characters?
Thanks,
Jonathan
Hello Sunny,
Thank you it works pretty well!
However, I'd like to understand a bit more how this works.
Does it count sessions containing a Start, Action and End type?
What does the P({}) mean?
Thank you for your answer,
Jonathan
Yes, the * sign between the three p() means the intersection. The set analysis is saying that include only those session_ids where all three types exists. Look here for p() and e(): P() & E() and where do you use them?
For more advanced set analysis, check here