<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read one file parrallely in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287329#M60934</link>
    <description>Sure, thanks for your help.</description>
    <pubDate>Mon, 04 Jul 2016 04:31:51 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2016-07-04T04:31:51Z</dc:date>
    <item>
      <title>Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287323#M60928</link>
      <description>&lt;P&gt;Experts, could you please help to me implement the solution to read the file parallel? so example i have a file of 10G. i want to have multiple partitions reading that file? is that possible?&lt;/P&gt;</description>
      <pubDate>Sun, 12 Jun 2016 23:10:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287323#M60928</guid>
      <dc:creator>_AnonymousUser</dc:creator>
      <dc:date>2016-06-12T23:10:35Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287324#M60929</link>
      <description>Hi, 
&lt;BR /&gt; 
&lt;FONT size="1"&gt;&lt;FONT face="Verdana," helvetica=""&gt;You could use a sequence in tMap to break up your file into smaller chunks. What kind of data do you have in this file?&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT size="1"&gt;&lt;FONT face="Verdana," helvetica=""&gt;Do you want to load your big file into DB? Could you please give us more information about your current job situation?&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT size="1"&gt;&lt;FONT face="Verdana," helvetica=""&gt;Best regards&lt;/FONT&gt;&lt;/FONT&gt; 
&lt;BR /&gt; 
&lt;FONT size="1"&gt;&lt;FONT face="Verdana," helvetica=""&gt;Sabrina&lt;/FONT&gt;&lt;/FONT&gt;</description>
      <pubDate>Fri, 17 Jun 2016 09:16:44 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287324#M60929</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-06-17T09:16:44Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287325#M60930</link>
      <description>I am receiving full refresh files from my source team which contain 160M records. this is full refresh files, so i will have to read file and compare with previously loaded data and identify Insert, Update and Delete and apply delta to DB table. so as an example below data, and here i have customer_id as PK 
&lt;BR /&gt; 
&lt;BR /&gt;Todays file contain 
&lt;BR /&gt;customer_id&amp;nbsp;&amp;nbsp; customer_name 
&lt;BR /&gt;100&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Sam 
&lt;BR /&gt;102&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Alex 
&lt;BR /&gt;105&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; David &amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;previously loaded table 
&lt;BR /&gt;100&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Sam 
&lt;BR /&gt;102&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Alexy 
&lt;BR /&gt;104&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; John 
&lt;BR /&gt; 
&lt;BR /&gt;so with above data , i need to mark 
&lt;BR /&gt;customer_id 102 for update, 105 for insert and 104 for delete. in other words, i need to use content of latest file into my final table. 
&lt;BR /&gt;i don't want to truncate and reload table because this table is used by client almost all time. logic for identifying delta i could achieve with tMap, but problem is with processing 160M records. which is taking lot of time to process. sample file content is posted below. 
&lt;BR /&gt;in below file first 2 columns are PK 
&lt;BR /&gt; 
&lt;BR /&gt;6014|A26904c676|0.0186370|61 
&lt;BR /&gt;6014|A27da32789|0.0154096|55 
&lt;BR /&gt;6014|A287f20d2c|0.0219631|55 
&lt;BR /&gt;6014|A2dfe8c97e|0.0408455|61 
&lt;BR /&gt;6014|A3b52342f8|0.0243586|61 
&lt;BR /&gt;6014|A3e7ac480f|0.0260668|61 
&lt;BR /&gt;6014|A5abde4f3b|0.0398880|55 
&lt;BR /&gt;6014|A5c54eed1b|0.0293591|55 
&lt;BR /&gt;6014|A5e4e4d111|0.0312439|61 
&lt;BR /&gt;6014|X14b34ecd508|0.0263314|61 
&lt;BR /&gt;6014|X14b34ecd529|0.0263314|61 
&lt;BR /&gt;6014|X14b34ecd53c|0.0263314|61 
&lt;BR /&gt;6014|X14b34ecd594|0.0464095|61 
&lt;BR /&gt;6014|X14b3f396fa8|0.0163314|58 
&lt;BR /&gt;6014|X14b53d31504|0.0207230|58 
&lt;BR /&gt;6014|X14c174dc981|0.0311294|55 
&lt;BR /&gt;6014|X14c174dc9f6|0.0224165|55 
&lt;BR /&gt;6014|X14c2be79613|0.0270148|55 
&lt;BR /&gt; 
&lt;BR /&gt;&amp;nbsp;</description>
      <pubDate>Sat, 18 Jun 2016 21:56:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287325#M60930</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-06-18T21:56:10Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287326#M60931</link>
      <description>The way to do this is to load the records into another table and carry out the comparison processing in the database. With your requirement to find deleted and new records, you will need to carry out two lookups using a tMap.&amp;nbsp;Doing a lookup comparison like that, with that many records in a tMap is going to be slow even with a really powerful system. Java is nowhere near as fast as a database for comparisons.&amp;nbsp;</description>
      <pubDate>Sat, 18 Jun 2016 23:06:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287326#M60931</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-06-18T23:06:15Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287327#M60932</link>
      <description>But my main problem is with reading 160M records from file into table. how can i make it parallelized?&amp;nbsp;&amp;nbsp; so if i compare with another ETL tool informatica, it has concept of partitions, it will split the big files into logical partitions and read file parallel. do we have something like that in Talend.</description>
      <pubDate>Wed, 22 Jun 2016 09:39:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287327#M60932</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-06-22T09:39:42Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287328#M60933</link>
      <description>With Talend you are not limited to only what Talend provides. You can also make use of third party Java APIs and command-line functionality. So, if you are working on a Linux environment you can use Split ( 
&lt;A href="http://askubuntu.com/questions/54579/how-to-split-larger-files-into-smaller-parts" rel="nofollow noopener noreferrer"&gt;http://askubuntu.com/questions/54579/how-to-split-larger-files-into-smaller-parts&lt;/A&gt;). If you are not (or if you don't want to use Split), you can make use of a bit of Java to split the file ( 
&lt;A href="http://stackoverflow.com/questions/19177994/java-read-file-and-split-into-multiple-files" rel="nofollow noopener noreferrer"&gt;http://stackoverflow.com/questions/19177994/java-read-file-and-split-into-multiple-files&lt;/A&gt;).&amp;nbsp; 
&lt;BR /&gt; 
&lt;BR /&gt;Processing in parallel may be a problem if you do not have the Enterprise Edition. That is one of the "paid for" features, but it doesn't stop you from doing this in parallel in the Open Source Edition. You can simply create a job which will read a file (name supplied by context variable) and the run it as many times as your system will handle it concurrently. This won't be the elegant solution that you get with the Enterprise Edition, but since the aim is simply to get the data loaded (I am assuming), then it shouldn't matter.</description>
      <pubDate>Wed, 22 Jun 2016 18:48:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287328#M60933</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-06-22T18:48:28Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287329#M60934</link>
      <description>Sure, thanks for your help.</description>
      <pubDate>Mon, 04 Jul 2016 04:31:51 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287329#M60934</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-07-04T04:31:51Z</dc:date>
    </item>
    <item>
      <title>Re: Read one file parrallely</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287330#M60935</link>
      <description>Hi&amp;nbsp;bibintjohn1,
&lt;BR /&gt;
&lt;BR /&gt;You can do this using enterprise edition , else the other option could be to do it manually. You can split your file using one job and then can execute multiple job in parallel on different file.&amp;nbsp;
&lt;BR /&gt;
&lt;BR /&gt;Thanks,
&lt;BR /&gt;Saurabh.</description>
      <pubDate>Mon, 04 Jul 2016 08:24:05 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Read-one-file-parrallely/m-p/2287330#M60935</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-07-04T08:24:05Z</dc:date>
    </item>
  </channel>
</rss>

