<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Working with Bigdata Pig in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Working-with-Bigdata-Pig/m-p/2210519#M8916</link>
    <description>Hi! 
&lt;BR /&gt;I tried to use Talend tPig* components, working with Hadoop. But there were a few problems I could not solve. Maybe there was some way I have not found? Maybe someone has some workaround or any positive practice working with ?Talend Pig?? I'd like to hear about it. 
&lt;BR /&gt;The following are some of the problems. 
&lt;BR /&gt; 
&lt;BR /&gt;UNION is absent. 
&lt;BR /&gt;Pig example: 
&lt;BR /&gt;-- the table ?/calls? contains phone calls. 
&lt;BR /&gt;-- I want to get a table which union incoming and outgoing calls. 
&lt;BR /&gt;call = LOAD '/calls' USING PigStorage(';') AS (subsfrom: chararray, substo: chararray, date: chararray); 
&lt;BR /&gt;C1 = FOREACH call GENERATE subsfrom, date, 'OUT' as direction:chararray; 
&lt;BR /&gt;C2 = FOREACH call GENERATE substo, date, 'IN' as direction:chararray; 
&lt;BR /&gt;U = UNION C1, C2; 
&lt;BR /&gt;STORE U INTO '/call1' USING PigStorage(';'); 
&lt;BR /&gt;I didn?t find any way to realize this example via Talend tPig*. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigLoad has only one exit. 
&lt;BR /&gt;An example is the same. Even if there was some ?union? component, I would have to use two identical tPigLoad components. Or more than two if I need to use this table more than twice. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigAggregate has low functionality. 
&lt;BR /&gt;See the example above. In this example I need to generate simple column ?direction?. I would like to do this with tPigAggregate. But the only way to do this was using tPigCode. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigJoin has only one enter. 
&lt;BR /&gt;Pig example: 
&lt;BR /&gt;phone = LOAD '/phone' USING PigStorage(';') AS (subs: chararray, churn: int); 
&lt;BR /&gt;call = LOAD '/calls' USING PigStorage(';') AS (subsfrom: chararray, substo: chararray, date: chararray, type: chararray); 
&lt;BR /&gt;-- Suppose we want to get list of SMS from customers that have flag phone.churn=0 
&lt;BR /&gt;fphone = FILTER phone BY (churn==0); 
&lt;BR /&gt;fcall = FILTER call BY (type=='sms'); 
&lt;BR /&gt;J = JOIN fphone BY subs, fcall BY subsfrom; 
&lt;BR /&gt;But we can use only one filtered table in tPigJoin. Not two (or three, four etc.) 
&lt;BR /&gt; 
&lt;BR /&gt;tPigCross has only one enter. 
&lt;BR /&gt;Example is the similar to the previous one. 
&lt;BR /&gt; 
&lt;BR /&gt;There is no way to use SPLIT. 
&lt;BR /&gt;Example: 
&lt;BR /&gt;SPLIT churn INTO churnvoice IF type=='voice', churnsms IF type=='sms', churnmms IF type=='mms', churngprs IF type=='gprs', churnussd IF type=='ussd', churn0 OTHERWISE; 
&lt;BR /&gt;Of course we could try to use tPigFilter instead, but because tPig elements have onle one exit we can do this only using six different chains of tPigLoad + tPigFilter elements. 
&lt;BR /&gt; 
&lt;BR /&gt;Where is LIMIT? 
&lt;BR /&gt;I would like to find it in PigStoreResult component, but I didn?t find it there. 
&lt;BR /&gt; 
&lt;BR /&gt;And what about authentication? 
&lt;BR /&gt;tHDFS* components (for example tHDFSConnection) contain username fields but there is no any similar ones in tPig* components. And this way I can work with Hadoop only when I switch off the authentication there. 
&lt;BR /&gt; 
&lt;BR /&gt;thanks, 
&lt;BR /&gt;-- Dmitriy.</description>
    <pubDate>Wed, 19 Dec 2012 11:24:39 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2012-12-19T11:24:39Z</dc:date>
    <item>
      <title>Working with Bigdata Pig</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Working-with-Bigdata-Pig/m-p/2210519#M8916</link>
      <description>Hi! 
&lt;BR /&gt;I tried to use Talend tPig* components, working with Hadoop. But there were a few problems I could not solve. Maybe there was some way I have not found? Maybe someone has some workaround or any positive practice working with ?Talend Pig?? I'd like to hear about it. 
&lt;BR /&gt;The following are some of the problems. 
&lt;BR /&gt; 
&lt;BR /&gt;UNION is absent. 
&lt;BR /&gt;Pig example: 
&lt;BR /&gt;-- the table ?/calls? contains phone calls. 
&lt;BR /&gt;-- I want to get a table which union incoming and outgoing calls. 
&lt;BR /&gt;call = LOAD '/calls' USING PigStorage(';') AS (subsfrom: chararray, substo: chararray, date: chararray); 
&lt;BR /&gt;C1 = FOREACH call GENERATE subsfrom, date, 'OUT' as direction:chararray; 
&lt;BR /&gt;C2 = FOREACH call GENERATE substo, date, 'IN' as direction:chararray; 
&lt;BR /&gt;U = UNION C1, C2; 
&lt;BR /&gt;STORE U INTO '/call1' USING PigStorage(';'); 
&lt;BR /&gt;I didn?t find any way to realize this example via Talend tPig*. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigLoad has only one exit. 
&lt;BR /&gt;An example is the same. Even if there was some ?union? component, I would have to use two identical tPigLoad components. Or more than two if I need to use this table more than twice. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigAggregate has low functionality. 
&lt;BR /&gt;See the example above. In this example I need to generate simple column ?direction?. I would like to do this with tPigAggregate. But the only way to do this was using tPigCode. 
&lt;BR /&gt; 
&lt;BR /&gt;tPigJoin has only one enter. 
&lt;BR /&gt;Pig example: 
&lt;BR /&gt;phone = LOAD '/phone' USING PigStorage(';') AS (subs: chararray, churn: int); 
&lt;BR /&gt;call = LOAD '/calls' USING PigStorage(';') AS (subsfrom: chararray, substo: chararray, date: chararray, type: chararray); 
&lt;BR /&gt;-- Suppose we want to get list of SMS from customers that have flag phone.churn=0 
&lt;BR /&gt;fphone = FILTER phone BY (churn==0); 
&lt;BR /&gt;fcall = FILTER call BY (type=='sms'); 
&lt;BR /&gt;J = JOIN fphone BY subs, fcall BY subsfrom; 
&lt;BR /&gt;But we can use only one filtered table in tPigJoin. Not two (or three, four etc.) 
&lt;BR /&gt; 
&lt;BR /&gt;tPigCross has only one enter. 
&lt;BR /&gt;Example is the similar to the previous one. 
&lt;BR /&gt; 
&lt;BR /&gt;There is no way to use SPLIT. 
&lt;BR /&gt;Example: 
&lt;BR /&gt;SPLIT churn INTO churnvoice IF type=='voice', churnsms IF type=='sms', churnmms IF type=='mms', churngprs IF type=='gprs', churnussd IF type=='ussd', churn0 OTHERWISE; 
&lt;BR /&gt;Of course we could try to use tPigFilter instead, but because tPig elements have onle one exit we can do this only using six different chains of tPigLoad + tPigFilter elements. 
&lt;BR /&gt; 
&lt;BR /&gt;Where is LIMIT? 
&lt;BR /&gt;I would like to find it in PigStoreResult component, but I didn?t find it there. 
&lt;BR /&gt; 
&lt;BR /&gt;And what about authentication? 
&lt;BR /&gt;tHDFS* components (for example tHDFSConnection) contain username fields but there is no any similar ones in tPig* components. And this way I can work with Hadoop only when I switch off the authentication there. 
&lt;BR /&gt; 
&lt;BR /&gt;thanks, 
&lt;BR /&gt;-- Dmitriy.</description>
      <pubDate>Wed, 19 Dec 2012 11:24:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Working-with-Bigdata-Pig/m-p/2210519#M8916</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2012-12-19T11:24:39Z</dc:date>
    </item>
  </channel>
</rss>

