Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
GP1608113771
Contributor II
Contributor II

How to achieve the functionality of Recursive CTE in Talend, without using Database

Hi All,

Could someone please help me with the below scenario. I have to find the genealogy for the below input table. I achieved this using Recursive CTE functionality during discovery phase of my project . But now the architecture of my project got changed for Alpha phase.

We are using Azure Data lake storage , this Azure data lake will be mounted on Azure Data bricks. we are using Azure delta lake tables to access the files in ADLS. Recursive CTE functionality is not supported in Azure Delta lake. 

I have to achieve this functionality in Talend Data fabric cloud version only. Could someone please tell me how to achieve this in Talend itself without using database. Please find the below input table structure and the expected output as well.

Input table

Parent_ID ID

NULL 001

001 002

002 003

003 004

004 005

Expected Output

Lvl ID Geneology

1 001 001

2 002 001/002

3 003 001/002/003

4 004 001/002/003/004

5 005 001/002/003/004/005

Kind Regards,

Gopinath P

Labels (3)
1 Reply
Anonymous
Not applicable

Recursion is a really useful tool, but it is also not the most efficient tool in the box when it comes to memory. So you must be careful when using it. Talend does not support recursion within Jobs, but you can use recursive routines to solve problems like this. Below is a routine that I have edited to suit your requirement....

 

*********************************************************************

 

package routines;

 

import java.util.HashMap;

import java.util.Map;

 

public class RecursiveRelationship {

 

  private static Map<String,RecursiveRelationship> rrMap= new HashMap<>();

  private String child;

  private String parent;

 

  public String getParent() {

    return parent;

  }

 

  public static RecursiveRelationship createChild(String child, String parent){

  RecursiveRelationship e= new RecursiveRelationship(child, parent);

    rrMap.put(child,e);

    return e;

  }

 

  private RecursiveRelationship(String child, String parent) {

    this.child = child;

    this.parent = parent;

  }

 

  

  private String getRelationship(String child){

    if (RecursiveRelationship.rrMap.get(child).getParent() == null) {

      return child;

    }else {

      return getRelationship(RecursiveRelationship.rrMap.get(child).getParent()) + "/" +

      child;

    }

  }

  

 

  @Override

  public String toString() {

  

    String relationship = getRelationship(child).equals(parent)?"":getRelationship(child);

    

    return child + "," + relationship;

 

  }

 

}

*********************************************************************

 

To use this, I set up a quick job with your data in a tFixedFlowInput component linking to a tJavaFlex. In the START CODE section of the tJavaFlex, I used this code......

 

java.util.List<RecursiveRelationship> emps = new java.util.ArrayList<>();

 

In the MAIN CODE section, I used this code......

 

emps.add(RecursiveRelationship.createChild(row1.ID,row1.Parent_ID));

 

In the END CODE section, I used this code.....

 

for(RecursiveRelationship emp:emps){

System.out.println(emp.toString());

}

 

Obviously if you cant to pass this data to other components you will need to put the for loop in a new tJavaFlex that starts a SubJob (the "for" in the START CODE, the "emp.toString()" in the MAIN CODE and the closing parenthesis in the END CODE) and that will pass the output one row at a time.

 

The output I got was as follows. I added my own parent child data as well to test it....

 

Parent ID| Child

null | 001

001 | 002

002 | 003

003 | 004

004 | 005

null | a

a | b

c | d

b | c

 

The output was this.........

001,001

002,001/002

003,001/002/003

004,001/002/003/004

005,001/002/003/004/005

a,a

b,a/b

d,a/b/c/d

c,a/b/c