Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
SkitzN27
Creator
Creator

Count/Sum Missing steps using a bar chart

Hi guys,

This is the most complicated query I came across, can anyone please help?

 

Question:

Every candidate must follow this sequence of steps to get recruited by a company
1. Apply for Job
2. Pass the Exam
3. Interview
4. Sign the Offer Letter
5. Background Check


Some of the candidates don't reach the final step and may never get a job.
Some candidates reach the final step but might have some missing steps in between.
(Like Kyle got a job but he didn't go through the 'Pass the Exam' step so I'll count 1 missing step here.
Similary Jack never made it to the last step, he couldn't pass the exam, so we won't consider any missing steps for him.
For King, the last step was 'Sign the Offer Letter', but he has a missing step, he didn't go through
interview, so we will count that as 1)

So basically, for any of the candidates whatever be the last step, they shouldn't have any missing steps in between.

Now, I want to take the count of missing step and present it using a bar chart. Can anyone help please?
It's the most complicated query I came across so far.

Sample Data:

Load * inline
[
Date, CandidateName, CandidateStatus, Counter
1/1/2020 ,Dan, Apply for Job, 1
1/3/2020 ,Dan, Pass the Exam, 1
1/5/2020 ,Dan, Interview, 1
1/7/2020 ,Dan, Sign the Offer Letter, 1
1/9/2020 ,Dan, Background Check, 1
2/1/2020 ,Kyle, Apply for Job, 1
2/5/2020 ,Kyle, Interview, 1
2/7/2020 ,Kyle, Sign the Offer Letter, 1
2/8/2020 ,Kyle, Background Check, 1
1/9/2020 ,Jack, Apply for Job, 1
2/8/2020 ,King, Apply for Job, 1
2/9/2020 ,King, Pass the Exam, 1
2/11/2020 ,King, Sign the Offer Letter, 1
3/11/2020 ,Steve, Apply for Job, 1
4/11/2020 ,Steve, Interview, 1
5/11/2020 ,Ray, Apply for Job, 1
6/11/2020 ,Ray, Background Check, 1
];

 

Output:

Screen Shot 2020-11-16 at 4.48.36 PM.png

 

Thank you so much.

@dilipranjith 

@QFabian @MayilVahanan @rwunderlich  @Kushal_Chawda   @Taoufiq_Zarra 

14 Replies
edwin
Master II
Master II

its just semantics, i used the count of candidates to count missed steps.  a step missed is really a candidate missing a step.

SkitzN27
Creator
Creator
Author

@edwin  So you still didn't get the point, Jack doesn't have any missing steps because he never made it to the next round. We are always having the updated data, so the mex(step) for jack is Applied.

Do you get what I'm trying to say?

edwin
Master II
Master II

hope the following helps:

the key is determining the last step a candidate has taken, then look for steps below the last step that is missing.

first assign a seq number to each step, then get the max of the step number per candidate, left join your data with the steps + seq , eliminate all steps that are less than the max, and anything that does not have a match (set ween the step seq and your data is the missing step:

SampleData:
Load * inline [
Date, CandidateName, CandidateStatus, Counter
1/1/2020 ,Dan, Apply for Job, 1
1/3/2020 ,Dan, Pass the Exam, 1
1/5/2020 ,Dan, Interview, 1
1/7/2020 ,Dan, Sign the Offer Letter, 1
1/9/2020 ,Dan, Background Check, 1
2/1/2020 ,Kyle, Apply for Job, 1
2/5/2020 ,Kyle, Interview, 1
2/7/2020 ,Kyle, Sign the Offer Letter, 1
2/8/2020 ,Kyle, Background Check, 1
1/9/2020 ,Jack, Apply for Job, 1
2/8/2020 ,King, Apply for Job, 1
2/9/2020 ,King, Pass the Exam, 1
2/11/2020 ,King, Sign the Offer Letter, 1
3/11/2020 ,Steve, Apply for Job, 1
4/11/2020 ,Steve, Interview, 1
5/11/2020 ,Ray, Apply for Job, 1
6/11/2020 ,Ray, Background Check, 1
];



Steps:
load distinct CandidateStatus Resident SampleData;

inner join (Steps) load * inline [
CandidateStatus, Seq
Apply for Job, 1
Pass the Exam, 2
Interview, 3
Sign the Offer Letter, 4
Background Check, 5
];


//get the last step for each candidate---------

left join (SampleData) load  CandidateStatus, Seq Resident Steps;

NoConcatenate Candidates: load CandidateName, max(Seq) as LastStep Resident SampleData group by CandidateName;

inner join (Candidates)
load  CandidateStatus, Seq resident Steps;

NoConcatenate MissingSteps: load CandidateName, CandidateStatus, LastStep, Seq as MissingStep Resident Candidates where Seq <= LastStep;

left join (MissingSteps) load CandidateName, CandidateStatus, Seq Resident SampleData;

drop table SampleData, Steps, Candidates;
exit script;  // added this to allow for testing, comment this out to get the final result

NoConcatenate
Final: load CandidateName, CandidateStatus, LastStep, MissingStep resident MissingSteps where isnull(Seq);

drop table MissingSteps;

 

attached is the resulting table.  i added an exit script so you can test the result.  comment the exit to get the final result.  the concept is do a full outer join of all steps with your data, trim the ones that is more than the last step taken by the candidate, then anything that has a NULL step taken is the missing step

Assumption, Background check is the last step and the first applicant in your data shows all the steps in the right order.

 

edwin
Master II
Master II

i should say, another solution is to do a peek and apply a couple of business rules to determine the missed step.  that is good for a few hundred or thousand rows but not for large data sets.  QV scripting works best using data sets and data manipulation using table functions like joins.  im sure someone could come up with better coding that is efficient but above is my quick and dirty to give you an idea how to ultimately solve your problem.

i hope i understood the problem correctly

edwin
Master II
Master II

here is a simpler version, again, whats needed is identify what is the last step taken by the candidate, compare it to all the steps prior to the last step and identify which ones are missing.

this entails using an island table so you can do an outer join then similar to the script logic before, trim out all steps that are prior to the max step and then match, anything that does not match (not in the original sample data table) is a missing step

SampleData:
Load * inline [
Date, CandidateName, CandidateStatus, Counter
1/1/2020 ,Dan, Apply for Job, 1
1/3/2020 ,Dan, Pass the Exam, 1
1/5/2020 ,Dan, Interview, 1
1/7/2020 ,Dan, Sign the Offer Letter, 1
1/9/2020 ,Dan, Background Check, 1
2/1/2020 ,Kyle, Apply for Job, 1
2/5/2020 ,Kyle, Interview, 1
2/7/2020 ,Kyle, Sign the Offer Letter, 1
2/8/2020 ,Kyle, Background Check, 1
1/9/2020 ,Jack, Apply for Job, 1
2/8/2020 ,King, Apply for Job, 1
2/9/2020 ,King, Pass the Exam, 1
2/11/2020 ,King, Sign the Offer Letter, 1
3/11/2020 ,Steve, Apply for Job, 1
4/11/2020 ,Steve, Interview, 1
5/11/2020 ,Ray, Apply for Job, 1
6/11/2020 ,Ray, Background Check, 1
];



Steps:
load distinct CandidateStatus Resident SampleData;

inner join (Steps) load * inline [
CandidateStatus, Seq
Apply for Job, 1
Pass the Exam, 2
Interview, 3
Sign the Offer Letter, 4
Background Check, 5
];


//get the last step for each candidate---------

left join (SampleData) load  CandidateStatus, Seq Resident Steps;


NoConcatenate Candidates: load CandidateName, max(Seq) as LastStep Resident SampleData group by CandidateName;


//---------
//this gets the last step taken
inner join (SampleData) load CandidateName, LastStep Resident Candidates;

//this is the island table that will be used to compare to steps taken
NoConcatenate AllSteps: load CandidateStatus as AllCandidateStatus, Seq as AllSeq Resident Steps;

drop table Candidates, Steps;
exit script;

 

then in your straight table chart,  use the field in the island table as a dimension, since the tables arent related, it will be a full outer join 

then add thsi expression:

= if( AllSeq<=LastStep and wildmatch( Concat(DISTINCT CandidateStatus,','),'*' & AllCandidateStatus & '*')=0,1,0)

tis filters out all steps beyond the last step taken by the candidate, then if the step is inside a concatenation of all steps already taken, those are thrown out as well.  whats left will be the steps not taken.

dsiclaimer: there should be a way to fine tune the expression using e() function, however, this will be slow if there are 100K or over 1M rows.  moving the logic to the script will definitely be faster