Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Split rows iteratively based on timestamp

Hello community,
I have a difficult requirement and I need ideas on how to implement this with Talend.
I have a table on a Redshift db with a startdatetime and a enddatetime column. What I need to do is split every row based on a 30-minute interval, until I reach the enddatetime value. Some examples to make myself clear:
[list=*]

  • Startdatetime: 2016-06-01 03:27:21 and Enddatetime: 2016-06-01 04:13:05 -> Final result should be 3 rows



          From  2016-06-01 03:27:21 to 2016-06-01 03:30:00
          From 2016-06-01 03:30:00 to 2016-06-01 04:00:00
          From 2016-06-01 04:00:00 to 2016-06-01 04:13:05
[list=*]

  • Startdatetime: 2016-06-01 03:11:01 and Enddatetime: 2016-06-01 03:17:58 -> no split

 

  • Startdatetime: 2016-06-01 03:54:26 and Enddatetime: 2016-06-01 04:11:10 -> Final result should be 2 rows



          From 2016-06-01 03:54:26 to 2016-06-01 04:00:00

          From 2016-06-01 04:00:00 to 2016-06-01 04:11:10


Intervals may range from a couple seconds to hours, so the number of splits is variable.
For now, I developed a query in Redshift to calculate for every row the first datetime split (2016-06-01 03:30:00 in my first example) and be able to discriminate with a tFilterRow the rows that really need to be splitted: in this way I can also easily calculate the following datetime splits by simply adding 30 minutes.

Now though, I don't know how to proceed. Can you advise me on the best Talend components to use? I probably need a loop somewhere, but I lack the general picture of the job, so to speak.

Many thanks for your time,
Simona

Labels (2)
3 Replies
Anonymous
Not applicable
Author

You could probably use tJavaRow/tJavaFlex and tNormalize to create a new row that effectively looks like a pivot and then get the tNormalize (or I think there is a tUnpivotRow custom component) to spit out the multiple output rows...
M
Anonymous
Not applicable
Author

I have written a tutorial on Row Multiplication, which is what this is really. It doesn't cover your time logic, but to be honest that is relatively straight forward with a bit of Java. This tutorial covers how to multiply your rows.

https://www.rilhia.com/quicktips/quick-tip-row-multiplication

Where I am passing a "repeatRow" numeric value to the tJavaFlex, you could pass your StartDateTime and EndDateTime columns. Then calculate how many 30 minutes occur between your Start and End dates to identify the number of iterations of the loop. Then simply return the calculated time for each loop in the Main section.

I hope it helps.
Anonymous
Not applicable
Author

most of the "calculation" will need to be done with custom java. Here's a quick example job -- 

0683p000009MFNL.png

Fixed flow setup:


0683p000009MFNQ.png

JavaFlex code:
start:
//for outputting to row
Date outStart = null;
Date outEnd = null;
Boolean stop = false;

//for pretty printing
java.text.SimpleDateFormat f = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

//setup StartDateTime
java.util.Calendar startDatetime = java.util.Calendar.getInstance(); // creates a new calendar instance
startDatetime.setTime((Date)globalMap.get("row1.startDatetime")); // assigns calendar to given StartDatetime

//setup EndDateTime
java.util.Calendar endDatetime = java.util.Calendar.getInstance(); // creates a new calendar instance
endDatetime.setTime((Date)globalMap.get("row1.endDatetime")); // assigns calendar to given endDatetime

//setup temp WorkingDateTime
java.util.Calendar workingDatetime = java.util.Calendar.getInstance();
workingDatetime.setTime((Date)globalMap.get("row1.startDatetime"));

//set the working date to the first 30m marker
workingDatetime.set(java.util.Calendar.MINUTE,30);
workingDatetime.set(java.util.Calendar.SECOND, 00);

if( workingDatetime.compareTo(startDatetime) > 0 ) {
//everything is fine, our startDatetime was before the 30m mark
}
else {
//our startDatetime is after the 30m mark
workingDatetime.set(java.util.Calendar.MINUTE,00);
workingDatetime.add(java.util.Calendar.HOUR, 1);
}

System.out.println(" Start " + f.format(startDatetime.getTime()) + " Working: " + f.format(workingDatetime.getTime()) + " End " + f.format(endDatetime.getTime()) );


do {


Main:

if( workingDatetime.compareTo(endDatetime) > 0 ) {
//we've already ranged over our endDatetime. means only one interval with startDateTime and endDateTime
outStart = startDatetime.getTime();
outEnd = endDatetime.getTime();
stop = true;
}
else {
outStart = startDatetime.getTime();
outEnd = workingDatetime.getTime();

//increment start
startDatetime.setTime( workingDatetime.getTime() );
//add 30 M to working datetime
workingDatetime.add(java.util.Calendar.MINUTE, 30 );
}
//emit the rows
row2.startDatetime = outStart;
row2.endDatetime = outEnd;

End:
// end of the component, outside/closing the loop
} while ( !stop ); // end while


Output:


0683p000009MFNV.png