Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
re_weavers
Contributor II
Contributor II

Help! How to denormalise arbitrary numbers of multiple columns

Hi,
I'm struggling to get one of our ETL jobs working as flexibly as is needed.
We have a data file like so:
A,B,C,D,F1,F2,G1,G2,H1,H2 and so on (In reality there are 9 columns per "set" rather than 2 in this example)
There are an arbitrary number of column pairs on the end and is different on every row.
We essentially want to produce something like this:
A,B,C,D,F1,F2
A,B,C,D,G1,G2
A,B,C,D,H1,H2
I've tried using tFileInputRegex to produce A,B,C,D,pair_string columns and then used tDenormalize but the tDenormalize component only appears to use 1 column so we'd end up with:
A,B,C,D,F1
A,B,C,D,F2
Using a really wide table and then using split row is fine, except that we have to "hard-code" a maximum number of pairs.
It feels like I need some sort of tNormalizeMultipleColumns or tExtractRegexFields_Recursive component to do this.
I'm hoping I've missed something obvious and someone can just go "you need to whack together X,Y,Z and it'll work fine!"
Help!
Thanks,
Rob
Labels (2)
9 Replies
alevy
Specialist
Specialist

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.
Anonymous
Not applicable

You could try the Talend Exchange component tSchemaNormalize.
You need at first a schema what reflects the whole width of your input data and this schema can be normalized to a schema like you need.
http://www.talendforge.org/exchange/index.php?eid=817&product=tos&action=view&nav=1,1,1
re_weavers
Contributor II
Contributor II
Author

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.

This is pretty much what I was coming to; however I was trying to find a way to "produce" rows from the tJava or tJavaFlex components but it seems to be impossible/not well documented. I'll try switching every n-th comma to a semic-colon for now and use tNormalize; thanks for the suggestion.
You could try the Talend Exchange component tSchemaNormalize.

Thanks! I'd not seen that, very useful! Unfortunately like you said, I'd have to add a maximum number of columns and this won't quite fit the bill.
Many thanks for your input!
Rob
alevy
Specialist
Specialist

You could try coding a loop into tJavaFlex to create multiple rows (I did get this to work once but then couldn't make it work the next time I tried it) but this is very non-standard and liable to break in later versions.
Anonymous
Not applicable

to make a javaFlex produce data, you have a few options:
As the "start" component:
this is the easy way. start a loop in the "Begin" section, assign values to the output row in the "Main" section and close the loop in the "End"
As part of a flow:
When the flex is part of a flow, the MAIN section will be called once per input row and will always produce at least one output row per input row (so filtering rows is not really possible). To add in extra rows mid-flow, you will need a loop in your MAIN section that assigns to the output row-- This loop will get processed once per input row.

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process. 0683p000009MACn.png
alevy
Specialist
Specialist

Like I said, I did manage to get the flex to filter or multiply my rows by putting a for loop declaration in the main section and closing it in the end section but it depends on which other components are in your flow. It worked following tSort but not following tFileInput (because my closing brace was put after the code closing the input).
re_weavers
Contributor II
Contributor II
Author

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process. 0683p000009MACn.png

Hiya,
Sorry I'm totally confused by that, how can you configure a tMap to produce rows?
Thanks,
Anonymous
Not applicable

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.
I have created this component because I had a use case with 44 columns in 23 sets. You can probably imagine I was totally unable to handle this with the build in possibilites.
re_weavers
Contributor II
Contributor II
Author

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.

Hi jlolling,
Apologies, I might not have explained the situation correctly. I have actually investigated this component (after your suggestion); what's stopping me is that the number of columns in the input schema varies wildly. Some of these files are 60 columns, others are 700; it is expected that the number of columns will increase over time and *needs* to be dynamic (requirement set on me by the data provider ¬_¬, I'm currently attempting to get them to change it into a more sensible schema but that's another issue).
Therefore (sadly I might add!) I can't create the input schema in a generic way; I was hoping that I might be able to team it with a dynamic schema (that takes the names from the headings). However the headings don't go all the way out!
Thanks,
Rob