Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys. I'm trying to create a table of aggregate values and I was wondering if you guys had an elegant solution.
I have one table called Tasks. Each task row has a Category field, and a Completed field with values Done or Not Done.
I want to create a derived Categories table that pulls out all the values of the Category field in the Tasks table. This table will obviously link to the Tasks table by the Category name. I want to then create two derived fields.
One field will be called HasCompletedTask, and will evaluate to true if there exists at least one Task row where Completed is Done. Otherwise, false.
The other derived field will be called AllTasksCompleted, and will evaluate to true if ALL task rows of that category are Done. Otherwise, false.
Any ideas? Please let me know if you have any questions. Thank you for your help!
Eric
It is probably easier and better to handle this in your Load using flags.
If you want to do it using Expressions, then you'll probably need the Concat() function. This should get you a comma separated list of all Completed values for that respective Category:
Concat(TOTAL <Categories> Completed)
The All Completed field will be easier than the Has Completed, because Done is a substring in Not Done. For the All Completed, try:
Index(Concat(TOTAL <Categories> Completed), 'Not')=0
I'm not sure of a great way to to the Has Completed. You need to search for Done, but not within a Not Done string.
EDIT: I guess you could try:
Index(Concat(TOTAL <Categories> Completed), ',Done')>0 or
Left(Concat(TOTAL <Categories> Completed), 4) = 'Done'
I think I would rather do this in the load script using flags. So on every Task row there's a flag Done or Not Done (I could change this to say Open or Closed or the like). I want to then evaluate, for every Category value, does there exist a Task that's Open. For the other field, I'd want to evaluate if there exists a Task that's Closed. What do you think?
What is the source of your data? Are you pulling from QVDs or using SQL to pull directly from a database?
I always seem to think of the SQL method first, because I've been using that longer. It would be something like this (Oracle syntax):
SELECT Tasks.*, Has.HasComp, All.AllComp FROM Tasks,
(SELECT Category, Max(Completed) As HasComp FROM Tasks
WHERE Completed = 'Done' GROUP BY Category) Has,
(SELECT Category, Max(Completed) As AllComp FROM Tasks
WHERE Completed = 'Not Done' GROUP BY Category) All
WHERE Tasks.Category = Has.Category (+)
AND Tasks.Category = All.Category (+)
Something like that. You would end up modifying the select to return 0 for HasComp if it is null and 1 if not. And the same thing for AllComp.
I'm sure you could also handle this using QlikView Load syntax.
I was able to set up a load using an Inline to create some test data. It's kind of long and probably inefficient. I'll also try to upload the sample file, but that functionality hasn't been working on the forum.
Load:
DataLoad:
LOAD * INLINE [
Task,Category,Completed
A,Cat1,Done
B,Cat2,Not Done
C,Cat1,Not Done
D,Cat2,Not Done
E,Cat3,Done
F,Cat3,Done
];
AllComp:
LOAD Category, Sum(0) As AllC
RESIDENT DataLoad
WHERE Completed = 'Not Done'
GROUP BY Category;
HasComp:
LOAD Category, Max(1) As HasC
RESIDENT DataLoad
WHERE Completed = 'Done'
GROUP BY Category;
DataInt:
LOAD Task As Task1, Category As Cat1,Completed As Comp1 RESIDENT DataLoad;
LEFT JOIN LOAD Category As Cat1, AllC RESIDENT AllComp;
LEFT JOIN LOAD Category As Cat1, HasC RESIDENT HasComp;
DROP TABLE DataLoad;
DROP TABLE AllComp;
DROP TABLE HasComp;
Data:
LOAD Task1 As Task, Cat1 As Category, Comp1 As Completed,
if(AllC <> 0, 1, AllC) As AllComplete, if(HasC <> 1, 0, HasC) As HasComplete
RESIDENT DataInt;
DROP TABLE DataInt;
You could do a load of the initial table, then do a distinct load of just the status field in a temp table basically leaving you either one or two values in a single field.
So, in the example data very kindly provided by NMiller, it would be:
DataLoad:
LOAD * INLINE [
Task,Category,Completed
A,Cat1,Done
B,Cat2,Not Done
C,Cat1,Not Done
D,Cat2,Not Done
E,Cat3,Done
F,Cat3,Done
];
CompletedIndexTemp:
LOAD DISTINCT Completed
RESIDENT DataLoad;
From here, I would use a peek in a variable to determine if 'Not Done' exists, and return text based on the result. My preference is always to use a variable for these kinds of things, but this could easily be done in another table as well.
LET CompletedFlag = If(peek('Completed', 0, 'CompletedIndexTemp') = 'Not Done' OR peek('Completed', 1, 'CompletedIndexTemp') = 'Not Done', 'Not Done', 'Done');