- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TOS - Help with Data transformation logic
Hello everyone, I have to make an etl and I stuck on a logic of transformation and maybe someone can help me. Simplifying a lot now my file is structured as below:
| ACTIVITY | IN | OUT |
| A | 1 | 2 |
| A | 2 | 3 |
| A | 3 | 4 |
| A | 6 | 7 |
The logic is that if there are 2 or more rows that are sequential ( OUT corresponding with IN ), I have to aggregate, keeping the minimum IN and the maximum OUT. So the results has to be:
| ACTIVITY | IN | OUT |
| A | 1 | 4 |
| A | 6 | 7 |
If there were just the first 3 rows, it would be easy using a tAggregate, but I don't know how to tell Talend that has to aggregate only sequential ones. (IN and OUT in reality are dates)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Basically, you should think of creating partition buckets. Once you have your partition buckets, you just do a min on IN and a max on OUT on the bucket group and you will have the answer.
Here is an example using integers. But you can use same logic for dates. I have done it with a tJavaFlex and some code as it is the fastest if your data is sorted on IN and is just like below.