Help! How to denormalise arbitrary numbers of mult... - Qlik Community

re_weavers · ‎2013-07-05

Hi,
I'm struggling to get one of our ETL jobs working as flexibly as is needed.
We have a data file like so:
A,B,C,D,F1,F2,G1,G2,H1,H2 and so on (In reality there are 9 columns per "set" rather than 2 in this example)
There are an arbitrary number of column pairs on the end and is different on every row.
We essentially want to produce something like this:
A,B,C,D,F1,F2
A,B,C,D,G1,G2
A,B,C,D,H1,H2
I've tried using tFileInputRegex to produce A,B,C,D,pair_string columns and then used tDenormalize but the tDenormalize component only appears to use 1 column so we'd end up with:
A,B,C,D,F1
A,B,C,D,F2
Using a really wide table and then using split row is fine, except that we have to "hard-code" a maximum number of pairs.
It feels like I need some sort of tNormalizeMultipleColumns or tExtractRegexFields_Recursive component to do this.
I'm hoping I've missed something obvious and someone can just go "you need to whack together X,Y,Z and it'll work fine!"
Help!
Thanks,
Rob

alevy · ‎2013-07-07

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.

Anonymous · ‎2013-07-07

You could try the Talend Exchange component tSchemaNormalize.
You need at first a schema what reflects the whole width of your input data and this schema can be normalized to a schema like you need.
http://www.talendforge.org/exchange/index.php?eid=817&product=tos&action=view&nav=1,1,1

re_weavers · ‎2013-07-08

I don't think there's an easy way to do this. I'd be looking at reading the file using tFileInputFullRow and then in tJavaRow, splitting the fields to an array and then looping through the results putting them back together in groups but with a different separator between the groups so that you can then use tNormalize to put each group on a separate row.

This is pretty much what I was coming to; however I was trying to find a way to "produce" rows from the tJava or tJavaFlex components but it seems to be impossible/not well documented. I'll try switching every n-th comma to a semic-colon for now and use tNormalize; thanks for the suggestion.

You could try the Talend Exchange component tSchemaNormalize.

Thanks! I'd not seen that, very useful! Unfortunately like you said, I'd have to add a maximum number of columns and this won't quite fit the bill.
Many thanks for your input!
Rob

alevy · ‎2013-07-08

You could try coding a loop into tJavaFlex to create multiple rows (I did get this to work once but then couldn't make it work the next time I tried it) but this is very non-standard and liable to break in later versions.

Anonymous · ‎2013-07-08

to make a javaFlex produce data, you have a few options:
As the "start" component:
this is the easy way. start a loop in the "Begin" section, assign values to the output row in the "Main" section and close the loop in the "End"
As part of a flow:
When the flex is part of a flow, the MAIN section will be called once per input row and will always produce at least one output row per input row (so filtering rows is not really possible). To add in extra rows mid-flow, you will need a loop in your MAIN section that assigns to the output row-- This loop will get processed once per input row.

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process.

alevy · ‎2013-07-08

Like I said, I did manage to get the flex to filter or multiply my rows by putting a for loop declaration in the main section and closing it in the end section but it depends on which other components are in your flow. It worked following tSort but not following tFileInput (because my closing brace was put after the code closing the input).

re_weavers · ‎2013-07-09

edit: on second thought, adding rows when the flex is a part of a flow is probably not possible. use a tMap to multiply and then a flex to process.

Hiya,
Sorry I'm totally confused by that, how can you configure a tMap to produce rows?
Thanks,

Anonymous · ‎2013-07-09

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.
I have created this component because I had a use case with 44 columns in 23 sets. You can probably imagine I was totally unable to handle this with the build in possibilites.

re_weavers · ‎2013-07-10

I have no clue, what prevents you from using my suggestion. The mentioned component tNormalizeSchema does EXACTLY what you need.
This component has a PDF documentation and if you would take a look into the examples, you would find exactly your use case.

Hi jlolling,
Apologies, I might not have explained the situation correctly. I have actually investigated this component (after your suggestion); what's stopping me is that the number of columns in the input schema varies wildly. Some of these files are 60 columns, others are 700; it is expected that the number of columns will increase over time and *needs* to be dynamic (requirement set on me by the data provider ¬_¬, I'm currently attempting to get them to change it into a more sensible schema but that's another issue).
Therefore (sadly I might add!) I can't create the input schema in a generic way; I was hoping that I might be able to team it with a dynamic schema (that takes the names from the headings). However the headings don't go all the way out!
Thanks,
Rob

Help! How to denormalise arbitrary numbers of multiple columns

Talend Data Integration

v5.x