<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to normalize a row into multiple rows from the names of columns? in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306991#M78472</link>
    <description>Hi Vaibhav, 
&lt;BR /&gt;Thanks for your answer. However, the service names are not actually "service1", "service2"; they do not increment by one each time. As an example, the services could be named "Paris","Red","January","Samsung","Obama". The logic to increment a count variable by one and append it to the string "service" would not work in this case. Do you know of a way to introspect the schema at run time such that an array could be formed dynamically based on the schema? In that case, your count method would still work as it could access the array for the correct column. 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;Matthew Moisen</description>
    <pubDate>Wed, 13 Aug 2014 20:35:48 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2014-08-13T20:35:48Z</dc:date>
    <item>
      <title>How to normalize a row into multiple rows from the names of columns?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306989#M78470</link>
      <description>I have an input csv with a format like the following:
&lt;BR /&gt;
&lt;PRE&gt;id, service1, service2, service3&lt;BR /&gt;x, 123, 234, 345&lt;BR /&gt;y, 456, 567, 678&lt;BR /&gt;&lt;/PRE&gt;
&lt;BR /&gt;I would like to produce an output with the following format:
&lt;BR /&gt;
&lt;PRE&gt;id, type, qty&lt;BR /&gt;x, service1, 123&lt;BR /&gt;x, service2, 234&lt;BR /&gt;x, service4, 345&lt;BR /&gt;y, service1, 456&lt;BR /&gt;y, service2, 567&lt;BR /&gt;y, service3, 678&lt;BR /&gt;&lt;/PRE&gt;
&lt;BR /&gt;In other words, I need to take a row with many attributes and normalize it into multiple rows, where the value of the "type" column in the output is equal to the name of the column in the input, and the value of the "quantity" column in the output is equal to the value of the name of the column in the input. I would prefer to do this by introspecting the schema of the input to determine the value of the "type" attribute in the output schema. Each input row is normalized into multiple output rows.
&lt;BR /&gt;This is my current solution:
&lt;BR /&gt;tFileInputDelimited -&amp;gt; tJavaRow -&amp;gt; tNormalize -&amp;gt; tExtractDelimitedFields -&amp;gt; tLogrow
&lt;BR /&gt;The tJavaRow has the following schema (id, line) and its code is as such:
&lt;BR /&gt;
&lt;PRE&gt;output_row.id = input_row.id;&lt;BR /&gt;output_row.line = "service1:" + input_row.service1 + ";service2:" + input_row.service2 + ";service3:" + input_row.service3;&lt;BR /&gt;&lt;/PRE&gt;
&lt;BR /&gt;The tFileInputDelimited has the schema (id, service1, service2, service3), as noted above.
&lt;BR /&gt;The tNormalizeRow normalizes the column "line" using the item separator ";".
&lt;BR /&gt;The tExtractDelimitedFields has the following schema (id, type, quantity), the field to split is "line", and the field separator is ":".
&lt;BR /&gt;This works, but I don't like doing it this way. If a new service, such as "service4" is added, I will have to change the schema of the tFileInputDelimited, and change the code in tJavaRow to account for the -- ";service4:" + input_row.service4; -- part. If a service is removed, I would have to remove it in both spots. In reality there are many services, so the tJavaRow will get very long.
&lt;BR /&gt;I would prefer a solution that could introspect the schema of the tFileInputDelimited, and assume that any column other than "id" (or additional, specified attributes) is a service that should be normalized into the "type" column.
&lt;BR /&gt;What is the best way to accomplish this using Talend?</description>
      <pubDate>Wed, 13 Aug 2014 02:06:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306989#M78470</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-08-13T02:06:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to normalize a row into multiple rows from the names of columns?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306990#M78471</link>
      <description>Hi, 
&lt;BR /&gt;Separate your flow in two steps. First separate your type fields by ';' operator using 
&lt;BR /&gt;tfileinputrow--&amp;gt;String handling (replace all occurances of ',' by ';' except first)--&amp;gt;tfileoutput 
&lt;BR /&gt;Your data would look like 
&lt;BR /&gt;x, 123; 234; 345 
&lt;BR /&gt;y, 456; 567; 678 
&lt;BR /&gt;use following flow to generate service1,service2..etc 
&lt;BR /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MEid.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/150415i56C6BE245616EC8D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MEid.png" alt="0683p000009MEid.png" /&gt;&lt;/span&gt; 
&lt;BR /&gt;Following code was used in tJavarow 
&lt;BR /&gt; 
&lt;FONT color="#3333ff"&gt;//Code generated according to input schema and output schema&lt;BR /&gt;output_row.id = input_row.id;&lt;BR /&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#3333ff"&gt;if (id_tMemorizeRows_1== id_tMemorizeRows_1)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; context.cnt = context.cnt+1;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.Services = "Services"+context.cnt.toString();&lt;BR /&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;BR /&gt; 
&lt;FONT color="#3333ff"&gt;}&lt;BR /&gt;else if (id_tMemorizeRows_1!= id_tMemorizeRows_1)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; context.cnt =&amp;nbsp; 1;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.Services = "Services"+context.cnt.toString();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;}&lt;BR /&gt;output_row.Type = input_row.Type;&lt;/FONT&gt; 
&lt;BR /&gt;Thanks 
&lt;BR /&gt;vaibhav</description>
      <pubDate>Wed, 13 Aug 2014 09:49:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306990#M78471</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-08-13T09:49:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to normalize a row into multiple rows from the names of columns?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306991#M78472</link>
      <description>Hi Vaibhav, 
&lt;BR /&gt;Thanks for your answer. However, the service names are not actually "service1", "service2"; they do not increment by one each time. As an example, the services could be named "Paris","Red","January","Samsung","Obama". The logic to increment a count variable by one and append it to the string "service" would not work in this case. Do you know of a way to introspect the schema at run time such that an array could be formed dynamically based on the schema? In that case, your count method would still work as it could access the array for the correct column. 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;Matthew Moisen</description>
      <pubDate>Wed, 13 Aug 2014 20:35:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306991#M78472</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-08-13T20:35:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to normalize a row into multiple rows from the names of columns?</title>
      <link>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306992#M78473</link>
      <description>&lt;BLOCKQUOTE&gt; 
 &lt;TABLE border="1"&gt; 
  &lt;TBODY&gt; 
   &lt;TR&gt; 
    &lt;TD&gt;Hi,&lt;BR /&gt;Separate your flow in two steps. First separate your type fields by ';' operator using&lt;BR /&gt;tfileinputrow--&amp;gt;String handling (replace all occurances of ',' by ';' except first)--&amp;gt;tfileoutput&lt;BR /&gt;Your data would look like&lt;BR /&gt;x, 123; 234; 345&lt;BR /&gt;y, 456; 567; 678&lt;BR /&gt;use following flow to generate service1,service2..etc&lt;BR /&gt;&lt;BR /&gt;Following code was used in tJavarow&lt;BR /&gt;&lt;FONT color="#3333ff"&gt;//Code generated according to input schema and output schema&lt;BR /&gt;output_row.id = input_row.id;&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#3333ff"&gt;if (id_tMemorizeRows_1== id_tMemorizeRows_1)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; context.cnt = context.cnt+1;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.Services = "Services"+context.cnt.toString();&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#3333ff"&gt;}&lt;BR /&gt;else if (id_tMemorizeRows_1!= id_tMemorizeRows_1)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; context.cnt =&amp;nbsp; 1;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.Services = "Services"+context.cnt.toString();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;}&lt;BR /&gt;output_row.Type = input_row.Type;&lt;/FONT&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;vaibhav&lt;/TD&gt; 
   &lt;/TR&gt; 
  &lt;/TBODY&gt; 
 &lt;/TABLE&gt; 
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Tue, 05 Jul 2016 14:53:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/How-to-normalize-a-row-into-multiple-rows-from-the-names-of/m-p/2306992#M78473</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-07-05T14:53:06Z</dc:date>
    </item>
  </channel>
</rss>

