Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content
Announcements
WEBINAR June 25, 2025: Build on Apache Iceberg with Qlik Open Lakehouse - REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
shep_work
Contributor III
Contributor III

Unable to flatten a hierarchy and retain original rows

I have a set of data that is ballooning from 23k records to over 123k records after applying hierarchy logic.  The data set has up to 7 levels and is made up of approx. 1,300 records with 17 in dividual dates.  I need to apply the hierarchy logic per date because it will get out of control to over 83 million rows during the 'expanding leaves' stage and also because we track where people go from one group to another.  I have the script (below) that can get all the rows and the hierarchy but bloats the records returned making the rest of the app difficult to load and deal with.  I have tried a few things from @hic ; who is a genius btw, but I just can't crack it.  Any help would be much appreciated.

In this sample the original row count is 22, the final row count is 40 and is a good representative case of my data set and the code I am currently using.

[Headcount_Source]:
LOAD
  ID,
  FIRST_NAME,
  LAST_NAME,
  SUPERVISOR_ID,
  num(Date#(DATA_LOAD_DATE, 'YYYY-MM-DD')) AS DATA_LOAD_DATE
  
INLINE [
        ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
        100,Jill,Hill,1234567890,2021-01-01
        210,Don,Mountain,100,2021-01-01
        1234567890,Rachel,Pail,1234567890,2021-01-01
        310,Jennifer,Story,210,2021-01-01
        410,Jackson,Hilltop,310,2021-01-01
        500,Al,Mudd,310,2021-01-01
        100,Jill,Hill,1234567890,2021-02-01
        210,Don,Mountain,100,2021-02-01
        1234567890,Rachel,Pail,1234567890,2021-02-01
        310,Jennifer,Story,210,2021-02-01
        410,Jackson,Hilltop,310,2021-02-01
        500,Al,Mudd,410,2021-02-01
        650,Judy,Judgy,210,2021-02-01
        750,Alice,Wonder,1234567890,2021-02-01
        100,Jill,Hill,1234567890,2021-03-01
        210,Don,Mountain,100,2021-03-01
        1234567890,Rachel,Pail,1234567890,2021-03-01
        310,Jennifer,Story,210,2021-03-01
        410,Jackson,Hilltop,310,2021-03-01
        500,Al,Mudd,210,2021-03-01
        650,Judy,Judgy,100,2021-02-01
        750,Alice,Wonder,1234567890,2021-02-01
];


// Step 1: Extract unique dates from the dataset

[UniqueDates]:
LOAD Distinct
	  DATA_LOAD_DATE AS DATA_LOAD_DATE
RESIDENT Headcount_Source;

LET MyMessage = NoOfRows('UniqueDates') & ' rows UniqueDates table';

Trace $(MyMessage);

LET vDate_List = peek('DATA_LOAD_DATE', 1,'UniqueDates');

Trace $(vDate_List);

Final_HierarchyTable:
LOAD
    Null() AS ID,
    Null() AS FIRST_NAME,
    Null() AS LAST_NAME,
    Null() AS SUPERVISOR_ID,
    Null() AS DATA_LOAD_DATE,
    Null() AS LEVEL,
    Null() AS ParentName,
    Null() AS PathName,
    Null() AS Depth,
    Null() AS LEVEL1,
    Null() AS LEVEL2,
    Null() AS LEVEL3,
    Null() AS LEVEL4,
    Null() AS LEVEL5,
    Null() AS LEVEL6
AUTOGENERATE 0;


FOR i = 0 to NoOfRows('UniqueDates') - 1
	LET vDate = peek('DATA_LOAD_DATE', $(i),'UniqueDates');
    
    
        TRACE Processing data for date: $(vDate);

    // Step 4: Load raw data for the current date
    Headcount_Raw:
    LOAD
      *,
      FIRST_NAME & ' ' & LAST_NAME AS LEVEL
    RESIDENT Headcount_Source
    WHERE DATA_LOAD_DATE = $(vDate);
    
    
    // Step 5: Build hierarchy for the current date
    
    
    [Hierarchy_Table]:
    Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
    NoConcatenate
    LOAD
      *
    RESIDENT Headcount_Raw;

    // Step 6: Concatenate the hierarchy data into the final table
    CONCATENATE (Final_HierarchyTable)
    LOAD
      *

    RESIDENT Hierarchy_Table;

    // Step 7: Drop temporary tables for the current date to free memory
    DROP TABLE Headcount_Raw;
    
    
    TRACE $(vDate);
NEXT    

DROP TABLE [UniqueDates];
DROP TABLE [Hierarchy_Table];
DROP TABLE [Headcount_Source];
Labels (2)
1 Solution

Accepted Solutions
shep_work
Contributor III
Contributor III
Author

I found the solution, which is kinda unexpected, or at least not something I would expect. This has 2 parts. 

Part 1, removal of the [Final_HierarchyTable] auto generated table with nulls to put in the unfolded hierarchy; not sure why but when I was first coding this I needed this part because it kept not having a place to put the records after the hierarchy sequence; but now that is not necessary. 

Part 2, and this wasn't in my example, the original source data has the data_load_date as the numeric e.g. '45663' instead of '2025-01-06'; which I knew, but what I didn't expect was in the [UniqueDates] table that reads from that source any renaming of that field as itself removes all date format conversion.  I will show what I mean below.  So I kept going in circles getting either tons and tons of rows or zero rows. In order to 'fix' this very strange behavior I had to first set the [UniqueDates] table's field  to num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3 and then in the Peek phase use LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates'); so that later when it was evaluated in - WHERE DATA_LOAD_DATE = $(vDate); - they would match.  I have never run across where the field name with the pattern AS FieldName would alter the format of the field if that FieldName was in the original table. Lastly no matter what other formatting (or no formatting) I tried worked.  The Trace of the $(vDate) always showed 'YYYY-MM-DD' format regardless of the format changes before or afterward. So the following combo in the new code is the only way it would work.  This returns the right amount of rows and the right hierarchy. Sheesh.  Thank you for your help.

[Headcount_Source]:
LOAD
  ID,
  FIRST_NAME,
  LAST_NAME,
  SUPERVISOR_ID,
  DATA_LOAD_DATE 
INLINE [
        ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
        100,Jill,Hill,1234567890,45663
        210,Don,Mountain,100,45663
        1234567890,Rachel,Pail,1234567890,45663
        310,Jennifer,Story,210,45663
        410,Jackson,Hilltop,310,45663
        500,Al,Mudd,310,45663
        100,Jill,Hill,1234567890,45670
        210,Don,Mountain,100,45670
        1234567890,Rachel,Pail,1234567890,45670
        310,Jennifer,Story,210,45670
        410,Jackson,Hilltop,310,45670
        500,Al,Mudd,410,45670
        650,Judy,Judgy,210,45670
        750,Alice,Wonder,1234567890,45670
        100,Jill,Hill,1234567890,45677
        210,Don,Mountain,100,45677
        1234567890,Rachel,Pail,1234567890,45677
        310,Jennifer,Story,210,45677
        410,Jackson,Hilltop,310,45677
        500,Al,Mudd,210,45677
        650,Judy,Judgy,100,45670
        750,Alice,Wonder,1234567890,45670
];

// Step 1: Extract unique dates from the dataset
[UniqueDates]:
LOAD DISTINCT
    Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_3
RESIDENT Headcount_Source;

FOR i = 0 TO NoOfRows('UniqueDates') - 1
    LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates');
    
    TRACE Processing data for date: $(vDate);

    // Step 2: Load raw data for the current date
    [Headcount_Raw]:
    LOAD
        *,
        FIRST_NAME & ' ' & LAST_NAME AS LEVEL
    RESIDENT Headcount_Source
    WHERE DATA_LOAD_DATE = $(vDate);

    // Step 3: Build hierarchy for the current date
    [Hierarchy_Table]:
    Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
    NoConcatenate
    LOAD
        *
    RESIDENT Headcount_Raw;

    // Step 4: Combine hierarchy fields with the original rows
    [Final_HierarchyTable]:
    LOAD
        ID AS Final_ID,
        FIRST_NAME AS Final_FIRST_NAME,
        LAST_NAME AS Final_LAST_NAME,
        SUPERVISOR_ID AS Final_SUPERVISOR_ID,
        DATA_LOAD_DATE AS Final_DATA_LOAD_DATE,
        LEVEL1 AS Final_LEVEL1,
        LEVEL2 AS Final_LEVEL2,
        LEVEL3 AS Final_LEVEL3,
        LEVEL4 AS Final_LEVEL4,
        LEVEL5 AS Final_LEVEL5
    RESIDENT Hierarchy_Table;

    // Step 5: Drop temporary tables for the current date
    DROP TABLE Headcount_Raw;
    DROP TABLE Hierarchy_Table;
NEXT;

DROP TABLE [UniqueDates];
DROP TABLE [Headcount_Source];

Example of field name vs date format

	  DATA_LOAD_DATE, // = '45663'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_1, // = '2025-01-06'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_2, // = '2025-01-06'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE, // = '45663'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3, // = '45663'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) // = '45663'

 

View solution in original post

9 Replies
hic
Former Employee
Former Employee

Since the hierarchies are different day-to-day (it's a slowly changing dimension), you need to include the date in the ID. Try the following:

[Headcount_Source]:
Hierarchy(NodeID, SUPERVISOR_ID, Name, ParentName, Name, PathName, '|', Depth)
LOAD
ID,
ID & '|' & DATA_LOAD_DATE as NodeID,
FIRST_NAME,
LAST_NAME,
FIRST_NAME & ' ' & LAST_NAME as Name,
SUPERVISOR_ID & '|' & DATA_LOAD_DATE as SUPERVISOR_ID,
Date(Date#(DATA_LOAD_DATE, 'YYYY-MM-DD')) AS DATA_LOAD_DATE

INLINE [
ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
100,Jill,Hill,1234567890,2021-01-01
210,Don,Mountain,100,2021-01-01
1234567890,Rachel,Pail,1234567890,2021-01-01
310,Jennifer,Story,210,2021-01-01
410,Jackson,Hilltop,310,2021-01-01
500,Al,Mudd,310,2021-01-01
100,Jill,Hill,1234567890,2021-02-01
210,Don,Mountain,100,2021-02-01
1234567890,Rachel,Pail,1234567890,2021-02-01
310,Jennifer,Story,210,2021-02-01
410,Jackson,Hilltop,310,2021-02-01
500,Al,Mudd,410,2021-02-01
650,Judy,Judgy,210,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
100,Jill,Hill,1234567890,2021-03-01
210,Don,Mountain,100,2021-03-01
1234567890,Rachel,Pail,1234567890,2021-03-01
310,Jennifer,Story,210,2021-03-01
410,Jackson,Hilltop,310,2021-03-01
500,Al,Mudd,210,2021-03-01
650,Judy,Judgy,100,2021-02-01
750,Alice,Wonder,1234567890,2021-02-01
];

marcus_sommer

I'm not very familiar with this kind of hierarchy-stuff and my last touch was ages ago but I think I would start with a different approach and not applying the hierarchy-load within an external load-loop else doing the hierarchy-load only ones and then connecting it with the facts. Maybe with measurements like this:

  • creating a from-to area from the dates (loading the staff-data resident and ordered by ID and Date and using interrecord-functions like previous() and peek() within appropriate if-loops to query the previous record and deriving the from-to values from the Date respectively previous(Date) +- 1 and setting the missing end to today()
  • applying the hierarchy-load
  • expanding a part of the hierarchy-table with an internal while to dedicated dates within a mapping table
  • loading the mapping within the facts to transfer the key between the hierarchy and the facts 

Such mapping may look like:

m: mapping load ID & '|' date(from + iterno() - 1), recno()
resident hierarchy while from + iterno() - 1 <= to;

and this may the call for the

applymap('m', ID & '|' & FactDate, '#NV') as HierarchyKey

shep_work
Contributor III
Contributor III
Author

Not sure if I'm understanding correctly, but using this still produces the bloated record number of 40 instead of 22, is that just unavoidable?

Headcount_Raw << Headcount_Source
 Lines fetched: 10
Hierarchy_Table-NodeList << Headcount_Raw
 Lines fetched: 10
Final_HierarchyTable << ExpandedLeaves
 Lines fetched: 28
Final_HierarchyTable << Hierarchy_Table
 Lines fetched: 40

 btw, I replaced the code in the [Hierarchy_Table] section not the Hierarch_Source because it has to do the hierarchy for each date individually; otherwise it's simply too much for the system to 'chew' on in the full data.  

shep_work
Contributor III
Contributor III
Author

I started out trying something similar but couldn't get it to due a correct hierarchy like the built in function does and it took a lot more processing; the entire process needs to produce a hierarchy based on only 1 field, the supervisor_id, which means each time a record is read, it will have to loop through each level at least 1 time.  Unless I just don't see something in the suggestion; which is possible.

marcus_sommer

Like hinted I'm not very familiar with hierarchy-load but I would think that a staff-table with a few ten thousands records would be resolved within a minute or two. The previous from-to creation also less than a minute and the loop within the mapping may go to some millions records but again I think it's a matter of one or two minutes. Applying then applymap() within the facts is no heavy transformation and it would only noticeable if it breaks an optimized load. All this without an extreme consumption of CPU + RAM.

If this really hits the expectation you should embed it within an incremental approach to resolve only the current states. 

shep_work
Contributor III
Contributor III
Author

I found the solution, which is kinda unexpected, or at least not something I would expect. This has 2 parts. 

Part 1, removal of the [Final_HierarchyTable] auto generated table with nulls to put in the unfolded hierarchy; not sure why but when I was first coding this I needed this part because it kept not having a place to put the records after the hierarchy sequence; but now that is not necessary. 

Part 2, and this wasn't in my example, the original source data has the data_load_date as the numeric e.g. '45663' instead of '2025-01-06'; which I knew, but what I didn't expect was in the [UniqueDates] table that reads from that source any renaming of that field as itself removes all date format conversion.  I will show what I mean below.  So I kept going in circles getting either tons and tons of rows or zero rows. In order to 'fix' this very strange behavior I had to first set the [UniqueDates] table's field  to num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3 and then in the Peek phase use LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates'); so that later when it was evaluated in - WHERE DATA_LOAD_DATE = $(vDate); - they would match.  I have never run across where the field name with the pattern AS FieldName would alter the format of the field if that FieldName was in the original table. Lastly no matter what other formatting (or no formatting) I tried worked.  The Trace of the $(vDate) always showed 'YYYY-MM-DD' format regardless of the format changes before or afterward. So the following combo in the new code is the only way it would work.  This returns the right amount of rows and the right hierarchy. Sheesh.  Thank you for your help.

[Headcount_Source]:
LOAD
  ID,
  FIRST_NAME,
  LAST_NAME,
  SUPERVISOR_ID,
  DATA_LOAD_DATE 
INLINE [
        ID,FIRST_NAME,LAST_NAME,SUPERVISOR_ID,DATA_LOAD_DATE
        100,Jill,Hill,1234567890,45663
        210,Don,Mountain,100,45663
        1234567890,Rachel,Pail,1234567890,45663
        310,Jennifer,Story,210,45663
        410,Jackson,Hilltop,310,45663
        500,Al,Mudd,310,45663
        100,Jill,Hill,1234567890,45670
        210,Don,Mountain,100,45670
        1234567890,Rachel,Pail,1234567890,45670
        310,Jennifer,Story,210,45670
        410,Jackson,Hilltop,310,45670
        500,Al,Mudd,410,45670
        650,Judy,Judgy,210,45670
        750,Alice,Wonder,1234567890,45670
        100,Jill,Hill,1234567890,45677
        210,Don,Mountain,100,45677
        1234567890,Rachel,Pail,1234567890,45677
        310,Jennifer,Story,210,45677
        410,Jackson,Hilltop,310,45677
        500,Al,Mudd,210,45677
        650,Judy,Judgy,100,45670
        750,Alice,Wonder,1234567890,45670
];

// Step 1: Extract unique dates from the dataset
[UniqueDates]:
LOAD DISTINCT
    Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_3
RESIDENT Headcount_Source;

FOR i = 0 TO NoOfRows('UniqueDates') - 1
    LET vDate = Peek('DATA_LOAD_DATE_3', $(i), 'UniqueDates');
    
    TRACE Processing data for date: $(vDate);

    // Step 2: Load raw data for the current date
    [Headcount_Raw]:
    LOAD
        *,
        FIRST_NAME & ' ' & LAST_NAME AS LEVEL
    RESIDENT Headcount_Source
    WHERE DATA_LOAD_DATE = $(vDate);

    // Step 3: Build hierarchy for the current date
    [Hierarchy_Table]:
    Hierarchy(ID, SUPERVISOR_ID, LEVEL, ParentName, LEVEL, PathName, '|', Depth)
    NoConcatenate
    LOAD
        *
    RESIDENT Headcount_Raw;

    // Step 4: Combine hierarchy fields with the original rows
    [Final_HierarchyTable]:
    LOAD
        ID AS Final_ID,
        FIRST_NAME AS Final_FIRST_NAME,
        LAST_NAME AS Final_LAST_NAME,
        SUPERVISOR_ID AS Final_SUPERVISOR_ID,
        DATA_LOAD_DATE AS Final_DATA_LOAD_DATE,
        LEVEL1 AS Final_LEVEL1,
        LEVEL2 AS Final_LEVEL2,
        LEVEL3 AS Final_LEVEL3,
        LEVEL4 AS Final_LEVEL4,
        LEVEL5 AS Final_LEVEL5
    RESIDENT Hierarchy_Table;

    // Step 5: Drop temporary tables for the current date
    DROP TABLE Headcount_Raw;
    DROP TABLE Hierarchy_Table;
NEXT;

DROP TABLE [UniqueDates];
DROP TABLE [Headcount_Source];

Example of field name vs date format

	  DATA_LOAD_DATE, // = '45663'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_1, // = '2025-01-06'
      Date#(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE, // = '45663'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD') AS DATA_LOAD_DATE_2, // = '2025-01-06'
      DATE(DATA_LOAD_DATE,'YYYY-MM-DD'), // = '2025-01-06'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE, // = '45663'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) AS DATA_LOAD_DATE_3, // = '45663'
      num(Date#(DATA_LOAD_DATE,'YYYY-MM-DD')) // = '45663'

 

marcus_sommer

It's good that you have now a working solution. But keep in mind that the approach to apply all dates against all people within an outside-loop with n loads might be done in a more efficient way to be able to handle also larger datasets.

shep_work
Contributor III
Contributor III
Author

Thank you Marcus, so far I haven't found a solution that is more efficient or faster than the built-in Hierarchy module, and trying the example you gave above didn't work to create the data set, but if you have a sample of working code, I would be curious to see it because I'm always open to making the process better.  I am limited in that I don't own the environment the code runs on so any solution has to be able to be run in the Qlik Sense environment.

marcus_sommer

I didn't say not to use the hierarchy-load else not to perform this stuff n times within a loop. Because by multiple years there happens several thousand times an initialising of a table, accessing the source with filters and doing more or less transformations and then finalising the tables loads (it's in general independent of the hierarchy-stuff). With rather small datasets it may perform sufficient but by larger ones it becomes easily the bottleneck.