Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a set of data that is ballooning from 23k records to over 123k records after applying hierarchy logic. The data set has up to 7 levels and is made up of approx. 1,300 records with 17 in dividual dates. I need to apply the hierarchy logic per date because it will get out of control to over 83 million rows during the 'expanding leaves' stage and also because we track where people go from one group to another. I have the script (below) that can get all the rows and the hierarchy but bloats the records returned making the rest of the app difficult to load and deal with. I have tried a few things from @hic ; who is a genius btw, but I just can't crack it. Any help would be much appreciated.
In this sample the original row count is 22, the final row count is 40 and is a good representative case of my data set and the code I am currently using.
[Headcount_Source]:
LOAD
ID,
FIRST_NAME,
LAST_NAME,
SUPERVISOR_ID,
num(Date#(DATA_LOAD_DATE, 'YYYY-MM-DD')) AS DATA_LOAD_DATE
INLINE [
ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
100,Jill,Hill,1234567890,2021-01-01
210,Don,Mountain,100,2021-01-01
1234567890,Rachel,Pail,1234567890,2021-01-01
310,Jennifer,Story,210,2021-01-01
410,Jackson,Hilltop,310,2021-01-01
500,Al,Mudd,310,2021-01-01
100,Jill,Hill,1234567890,2021-02-01
210,Don,Mountain,100,2021-02-01
1234567890,Rachel,Pail,1234567890,2021-02-01
310,Jennifer,Story,210,2021-02-01
410,Jackson,Hilltop,310,2021-02-01
500,Al,Mudd,410,2021-02-01
650,Judy,Judgy,210,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
100,Jill,Hill,1234567890,2021-03-01
210,Don,Mountain,100,2021-03-01
1234567890,Rachel,Pail,1234567890,2021-03-01
310,Jennifer,Story,210,2021-03-01
410,Jackson,Hilltop,310,2021-03-01
500,Al,Mudd,210,2021-03-01
650,Judy,Judgy,100,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
];
// Step 1: Extract unique dates from the dataset
[UniqueDates]:
LOAD Distinct
DATA_LOAD_DATE AS DATA_LOAD_DATE
RESIDENT Headcount_Source;
LET MyMessage = NoOfRows('UniqueDates') & ' rows UniqueDates table';
Trace $(MyMessage);
LET vDate_List = peek('DATA_LOAD_DATE', 1,'UniqueDates');
Trace $(vDate_List);
Final_HierarchyTable:
LOAD
Null() AS ID,
Null() AS FIRST_NAME,
Null() AS LAST_NAME,
Null() AS SUPERVISOR_ID,
Null() AS DATA_LOAD_DATE,
Null() AS LEVEL,
Null() AS ParentName,
Null() AS PathName,
Null() AS Depth,
Null() AS LEVEL1,
Null() AS LEVEL2,
Null() AS LEVEL3,
Null() AS LEVEL4,
Null() AS LEVEL5,
Null() AS LEVEL6
AUTOGENERATE 0;
FOR i = 0 to NoOfRows('UniqueDates') - 1
LET vDate = peek('DATA_LOAD_DATE', $(i),'UniqueDates');
TRACE Processing data for date: $(vDate);
// Step 4: Load raw data for the current date
Headcount_Raw:
LOAD
*,
FIRST_NAME & ' ' & LAST_NAME AS LEVEL
RESIDENT Headcount_Source
WHERE DATA_LOAD_DATE = $(vDate);
// Step 5: Build hierarchy for the current date
[Hierarchy_Table]:
Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
NoConcatenate
LOAD
*
RESIDENT Headcount_Raw;
// Step 6: Concatenate the hierarchy data into the final table
CONCATENATE (Final_HierarchyTable)
LOAD
*
RESIDENT Hierarchy_Table;
// Step 7: Drop temporary tables for the current date to free memory
DROP TABLE Headcount_Raw;
TRACE $(vDate);
NEXT
DROP TABLE [UniqueDates];
DROP TABLE [Hierarchy_Table];
DROP TABLE [Headcount_Source];
I found the solution, which is kinda unexpected, or at least not something I would expect. This has 2 parts.
Part 1, removal of the [Final_HierarchyTable] auto generated table with nulls to put in the unfolded hierarchy; not sure why but when I was first coding this I needed this part because it kept not having a place to put the records after the hierarchy sequence; but now that is not necessary.
Part 2, and this wasn't in my example, the original source data has the data_load_date as the numeric e.g. '45663' instead of '2025-01-06'; which I knew, but what I didn't expect was in the [UniqueDates] table that reads from that source any renaming of that field as itself removes all date format conversion. I will show what I mean below. So I kept going in circles getting either tons and tons of rows or zero rows. In order to 'fix' this very strange behavior I had to first set the [UniqueDates] table's field to num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3 and then in the Peek phase use LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates'); so that later when it was evaluated in - WHERE DATA_LOAD_DATE = $(vDate); - they would match. I have never run across where the field name with the pattern AS FieldName would alter the format of the field if that FieldName was in the original table. Lastly no matter what other formatting (or no formatting) I tried worked. The Trace of the $(vDate) always showed 'YYYY-MM-DD' format regardless of the format changes before or afterward. So the following combo in the new code is the only way it would work. This returns the right amount of rows and the right hierarchy. Sheesh. Thank you for your help.
[Headcount_Source]:
LOAD
ID,
FIRST_NAME,
LAST_NAME,
SUPERVISOR_ID,
DATA_LOAD_DATE
INLINE [
ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
100,Jill,Hill,1234567890,45663
210,Don,Mountain,100,45663
1234567890,Rachel,Pail,1234567890,45663
310,Jennifer,Story,210,45663
410,Jackson,Hilltop,310,45663
500,Al,Mudd,310,45663
100,Jill,Hill,1234567890,45670
210,Don,Mountain,100,45670
1234567890,Rachel,Pail,1234567890,45670
310,Jennifer,Story,210,45670
410,Jackson,Hilltop,310,45670
500,Al,Mudd,410,45670
650,Judy,Judgy,210,45670
750,Alice,Wonder,1234567890,45670
100,Jill,Hill,1234567890,45677
210,Don,Mountain,100,45677
1234567890,Rachel,Pail,1234567890,45677
310,Jennifer,Story,210,45677
410,Jackson,Hilltop,310,45677
500,Al,Mudd,210,45677
650,Judy,Judgy,100,45670
750,Alice,Wonder,1234567890,45670
];
// Step 1: Extract unique dates from the dataset
[UniqueDates]:
LOAD DISTINCT
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_3
RESIDENT Headcount_Source;
FOR i = 0 TO NoOfRows('UniqueDates') - 1
LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates');
TRACE Processing data for date: $(vDate);
// Step 2: Load raw data for the current date
[Headcount_Raw]:
LOAD
*,
FIRST_NAME & ' ' & LAST_NAME AS LEVEL
RESIDENT Headcount_Source
WHERE DATA_LOAD_DATE = $(vDate);
// Step 3: Build hierarchy for the current date
[Hierarchy_Table]:
Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
NoConcatenate
LOAD
*
RESIDENT Headcount_Raw;
// Step 4: Combine hierarchy fields with the original rows
[Final_HierarchyTable]:
LOAD
ID AS Final_ID,
FIRST_NAME AS Final_FIRST_NAME,
LAST_NAME AS Final_LAST_NAME,
SUPERVISOR_ID AS Final_SUPERVISOR_ID,
DATA_LOAD_DATE AS Final_DATA_LOAD_DATE,
LEVEL1 AS Final_LEVEL1,
LEVEL2 AS Final_LEVEL2,
LEVEL3 AS Final_LEVEL3,
LEVEL4 AS Final_LEVEL4,
LEVEL5 AS Final_LEVEL5
RESIDENT Hierarchy_Table;
// Step 5: Drop temporary tables for the current date
DROP TABLE Headcount_Raw;
DROP TABLE Hierarchy_Table;
NEXT;
DROP TABLE [UniqueDates];
DROP TABLE [Headcount_Source];
Example of field name vs date format
DATA_LOAD_DATE, // = '45663'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_1, // = '2025-01-06'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_2, // = '2025-01-06'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE, // = '45663'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3, // = '45663'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) // = '45663'
Since the hierarchies are different day-to-day (it's a slowly changing dimension), you need to include the date in the ID. Try the following:
[Headcount_Source]:
Hierarchy(NodeID, SUPERVISOR_ID, Name, ParentName, Name, PathName, '|', Depth)
LOAD
ID,
ID & '|' & DATA_LOAD_DATE as NodeID,
FIRST_NAME,
LAST_NAME,
FIRST_NAME & ' ' & LAST_NAME as Name,
SUPERVISOR_ID & '|' & DATA_LOAD_DATE as SUPERVISOR_ID,
Date(Date#(DATA_LOAD_DATE, 'YYYY-MM-DD')) AS DATA_LOAD_DATE
INLINE [
ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
100,Jill,Hill,1234567890,2021-01-01
210,Don,Mountain,100,2021-01-01
1234567890,Rachel,Pail,1234567890,2021-01-01
310,Jennifer,Story,210,2021-01-01
410,Jackson,Hilltop,310,2021-01-01
500,Al,Mudd,310,2021-01-01
100,Jill,Hill,1234567890,2021-02-01
210,Don,Mountain,100,2021-02-01
1234567890,Rachel,Pail,1234567890,2021-02-01
310,Jennifer,Story,210,2021-02-01
410,Jackson,Hilltop,310,2021-02-01
500,Al,Mudd,410,2021-02-01
650,Judy,Judgy,210,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
100,Jill,Hill,1234567890,2021-03-01
210,Don,Mountain,100,2021-03-01
1234567890,Rachel,Pail,1234567890,2021-03-01
310,Jennifer,Story,210,2021-03-01
410,Jackson,Hilltop,310,2021-03-01
500,Al,Mudd,210,2021-03-01
650,Judy,Judgy,100,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
];
I'm not very familiar with this kind of hierarchy-stuff and my last touch was ages ago but I think I would start with a different approach and not applying the hierarchy-load within an external load-loop else doing the hierarchy-load only ones and then connecting it with the facts. Maybe with measurements like this:
Such mapping may look like:
m: mapping load ID & '|' date(from + iterno() - 1), recno()
resident hierarchy while from + iterno() - 1 <= to;
and this may the call for the
applymap('m', ID & '|' & FactDate, '#NV') as HierarchyKey
Not sure if I'm understanding correctly, but using this still produces the bloated record number of 40 instead of 22, is that just unavoidable?
Headcount_Raw << Headcount_Source
Lines fetched: 10
Hierarchy_Table-NodeList << Headcount_Raw
Lines fetched: 10
Final_HierarchyTable << ExpandedLeaves
Lines fetched: 28
Final_HierarchyTable << Hierarchy_Table
Lines fetched: 40
btw, I replaced the code in the [Hierarchy_Table] section not the Hierarch_Source because it has to do the hierarchy for each date individually; otherwise it's simply too much for the system to 'chew' on in the full data.
I started out trying something similar but couldn't get it to due a correct hierarchy like the built in function does and it took a lot more processing; the entire process needs to produce a hierarchy based on only 1 field, the supervisor_id, which means each time a record is read, it will have to loop through each level at least 1 time. Unless I just don't see something in the suggestion; which is possible.
Like hinted I'm not very familiar with hierarchy-load but I would think that a staff-table with a few ten thousands records would be resolved within a minute or two. The previous from-to creation also less than a minute and the loop within the mapping may go to some millions records but again I think it's a matter of one or two minutes. Applying then applymap() within the facts is no heavy transformation and it would only noticeable if it breaks an optimized load. All this without an extreme consumption of CPU + RAM.
If this really hits the expectation you should embed it within an incremental approach to resolve only the current states.
I found the solution, which is kinda unexpected, or at least not something I would expect. This has 2 parts.
Part 1, removal of the [Final_HierarchyTable] auto generated table with nulls to put in the unfolded hierarchy; not sure why but when I was first coding this I needed this part because it kept not having a place to put the records after the hierarchy sequence; but now that is not necessary.
Part 2, and this wasn't in my example, the original source data has the data_load_date as the numeric e.g. '45663' instead of '2025-01-06'; which I knew, but what I didn't expect was in the [UniqueDates] table that reads from that source any renaming of that field as itself removes all date format conversion. I will show what I mean below. So I kept going in circles getting either tons and tons of rows or zero rows. In order to 'fix' this very strange behavior I had to first set the [UniqueDates] table's field to num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3 and then in the Peek phase use LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates'); so that later when it was evaluated in - WHERE DATA_LOAD_DATE = $(vDate); - they would match. I have never run across where the field name with the pattern AS FieldName would alter the format of the field if that FieldName was in the original table. Lastly no matter what other formatting (or no formatting) I tried worked. The Trace of the $(vDate) always showed 'YYYY-MM-DD' format regardless of the format changes before or afterward. So the following combo in the new code is the only way it would work. This returns the right amount of rows and the right hierarchy. Sheesh. Thank you for your help.
[Headcount_Source]:
LOAD
ID,
FIRST_NAME,
LAST_NAME,
SUPERVISOR_ID,
DATA_LOAD_DATE
INLINE [
ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
100,Jill,Hill,1234567890,45663
210,Don,Mountain,100,45663
1234567890,Rachel,Pail,1234567890,45663
310,Jennifer,Story,210,45663
410,Jackson,Hilltop,310,45663
500,Al,Mudd,310,45663
100,Jill,Hill,1234567890,45670
210,Don,Mountain,100,45670
1234567890,Rachel,Pail,1234567890,45670
310,Jennifer,Story,210,45670
410,Jackson,Hilltop,310,45670
500,Al,Mudd,410,45670
650,Judy,Judgy,210,45670
750,Alice,Wonder,1234567890,45670
100,Jill,Hill,1234567890,45677
210,Don,Mountain,100,45677
1234567890,Rachel,Pail,1234567890,45677
310,Jennifer,Story,210,45677
410,Jackson,Hilltop,310,45677
500,Al,Mudd,210,45677
650,Judy,Judgy,100,45670
750,Alice,Wonder,1234567890,45670
];
// Step 1: Extract unique dates from the dataset
[UniqueDates]:
LOAD DISTINCT
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_3
RESIDENT Headcount_Source;
FOR i = 0 TO NoOfRows('UniqueDates') - 1
LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates');
TRACE Processing data for date: $(vDate);
// Step 2: Load raw data for the current date
[Headcount_Raw]:
LOAD
*,
FIRST_NAME & ' ' & LAST_NAME AS LEVEL
RESIDENT Headcount_Source
WHERE DATA_LOAD_DATE = $(vDate);
// Step 3: Build hierarchy for the current date
[Hierarchy_Table]:
Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
NoConcatenate
LOAD
*
RESIDENT Headcount_Raw;
// Step 4: Combine hierarchy fields with the original rows
[Final_HierarchyTable]:
LOAD
ID AS Final_ID,
FIRST_NAME AS Final_FIRST_NAME,
LAST_NAME AS Final_LAST_NAME,
SUPERVISOR_ID AS Final_SUPERVISOR_ID,
DATA_LOAD_DATE AS Final_DATA_LOAD_DATE,
LEVEL1 AS Final_LEVEL1,
LEVEL2 AS Final_LEVEL2,
LEVEL3 AS Final_LEVEL3,
LEVEL4 AS Final_LEVEL4,
LEVEL5 AS Final_LEVEL5
RESIDENT Hierarchy_Table;
// Step 5: Drop temporary tables for the current date
DROP TABLE Headcount_Raw;
DROP TABLE Hierarchy_Table;
NEXT;
DROP TABLE [UniqueDates];
DROP TABLE [Headcount_Source];
Example of field name vs date format
DATA_LOAD_DATE, // = '45663'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_1, // = '2025-01-06'
Date#(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_2, // = '2025-01-06'
DATE(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE, // = '45663'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3, // = '45663'
num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) // = '45663'
It's good that you have now a working solution. But keep in mind that the approach to apply all dates against all people within an outside-loop with n loads might be done in a more efficient way to be able to handle also larger datasets.
Thank you Marcus, so far I haven't found a solution that is more efficient or faster than the built-in Hierarchy module, and trying the example you gave above didn't work to create the data set, but if you have a sample of working code, I would be curious to see it because I'm always open to making the process better. I am limited in that I don't own the environment the code runs on so any solution has to be able to be run in the Qlik Sense environment.
I didn't say not to use the hierarchy-load else not to perform this stuff n times within a loop. Because by multiple years there happens several thousand times an initialising of a table, accessing the source with filters and doing more or less transformations and then finalising the tables loads (it's in general independent of the hierarchy-stuff). With rather small datasets it may perform sufficient but by larger ones it becomes easily the bottleneck.