Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Pierre_B08
Contributor
Contributor

Data mapping optimization issue

Hello,

 

I'm facing some issues with one of my jobs on Talend Open Studio. I'm working with an Excel file as input, with the following format : 

Keys\ Parameters X1 X2 X3 X4
A some value some value some value some value
B some value some value some value some value
C some value some value some value some value
D some value some value some value some value

 

The tough part is that I can have a couple hundred parameters and they may not always be present in my file, so I need to check for each parameter if it is in the file or not. To do so, I'm using multiple tMaps, because one tMap component is not enough to support all my parameters. Therefore I have split my job in multiple subjobs. Also, for each parameter, if it is present in my file, I need to add another column representing the parameter code, which means I double the amount of expressions in my already massive tMaps. Afterwards, I need to split my data to have one row for each Key and Parameter, and one row for each Key and Parameter Code, so I first split my data to have one row for the parameters, and one row for the codes (i.e) : 

rowNumber Key X1 X2
1 row.Key row.X1 row.X2
2  row.Key row.X1Code row.X2Code

I then split these rows again to have my final format : 

rowNumber Key Data
row.rowNumber row.Key row.X1
row.rowNumber row.Key row.X2

 

This results in having multiple thousands of rows, and eventually in a crash. Also it seems that the last 20ish expressions of my tMaps are not working as intended, leaving a different value from the other columns, even though they use the same expressions. Is there a limit to the amount of columns you can use in a tMap ?

 

The method I'm using may not be the most optimized one, so I'm open to any advice on how I could handle this to have something more optimized and avoid crashes and strange job behaviour.

I was thinking of a get-around that would involve using multiple jobs instead of a single one, but that may not be an option.

 

Kind regards,

Pierre

Labels (2)
1 Reply
Pierre_B08
Contributor
Contributor
Author

Hi,

Any ideas on this ?

 

Thanks,

Pierre