<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Batch processing in talend job. in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292888#M65918</link>
    <description>&lt;P&gt;Sorry about the late reply&amp;nbsp;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;. I hope you had a great new year.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;In answer to your question, all you need to do is come up with a way of iterating over the different batches. This will very much depend upon what you are trying to achieve. But let's say you are simply using a tFlowToIterate component to link to the second tJavaFlex which releases the rows per batch. If you take a look at this code (taken from the example above) you will see I hardcoded it to retrieve only batch 0. Change this to use a value set in the globalMap by the tFlowToIterate and that solves your problem.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;//Retrieve a batch from the HashMap. YOU WILL NEED TO MODIFY THIS TO SUIT YOUR REQUIREMENT. I have hard coded it to only batch 0
java.util.ArrayList&amp;lt;row1Struct&amp;gt; array = (java.util.ArrayList&amp;lt;row1Struct&amp;gt;)map.get(0);
&lt;/PRE&gt;</description>
    <pubDate>Wed, 01 Jan 2020 20:32:19 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2020-01-01T20:32:19Z</dc:date>
    <item>
      <title>Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292873#M65903</link>
      <description>&lt;P&gt;Hi team,&lt;/P&gt;
&lt;P&gt;I need to implement batch processing in my talend job. how can i achieve it. scenario as below.&lt;/P&gt;
&lt;P&gt;Suppose i have 30000 records in my file and i need to process 1000 records at one time and after that next 1000 records will process.&lt;/P&gt;
&lt;P&gt;How can i achieve this scenario. 30000 records means 30 batches of records. Please help me with this scenario. I am using Talend data fabric 6.4.1 version.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Bhushan&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 04:39:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292873#M65903</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-02-27T04:39:06Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292874#M65904</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Redirect your input to a tFileOutputDelimited.&lt;/P&gt;
&lt;P&gt;Enter the output filename, tick the option "Split output in several files" from the "Advanced settings" and enter the value of 1000 into the field "Rows in each output file". This will create n files based on the filename with 1000 in each.&lt;/P&gt;
&lt;P&gt;On the next subjob, use a tFileList to iterate over this file list to get records from each file.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 08:56:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292874#M65904</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2018-02-27T08:56:16Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292875#M65905</link>
      <description>&lt;P&gt;Hi TRF,&lt;/P&gt;
&lt;P&gt;Thanks for the reply. I am getting xml files as a source and each xml file contains 30000 records. I don't want to create multiple input files i just need to create batches of one xml files and each batch processes one by one. Each batch contains 1000 records. How can i divide or how can i create batch of 1000 records?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Bhushan.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 09:05:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292875#M65905</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-02-27T09:05:28Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292876#M65906</link>
      <description>&lt;P&gt;You can go with arrays or lists (maybe with a little of Java code), but the solution I've proposed is the simplest and you don't have to be afraid about performance as tFileInputDelimited (or Output) are very fast.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 09:20:14 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292876#M65906</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2018-02-27T09:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292877#M65907</link>
      <description>&lt;P&gt;Hi TRF,&lt;/P&gt;
&lt;P&gt;Thanks for the reply. Can you please give java code to create batches of records cause i am not a java guy. If other option possible please suggest that also.&lt;/P&gt;
&lt;P&gt;I am not able to divide actual xml file into multiple files cause my job will not support it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Bhushan.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 09:29:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292877#M65907</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-02-27T09:29:21Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292878#M65908</link>
      <description>&lt;P&gt;As soon as you know how to consume your input XML file, you are able to produce as many CSV files as needed then to consume these files without a single line of Java code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;tFileInputXML --&amp;gt; tFileOutputDelimited (to produce n files)&lt;BR /&gt;|
|(on subjob OK)&lt;BR /&gt;|
tFileList -(iterate over the CSV files list)--&amp;gt; tFileInputDelimited --&amp;gt; next components to proceed&lt;/PRE&gt;&lt;P&gt;Consider this solution as a good approach as you don't have to code anything by yourself, especially if you're not a Java developer.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 10:00:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292878#M65908</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2018-02-27T10:00:38Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292879#M65909</link>
      <description>&lt;P&gt;Hi TRF,&lt;/P&gt; 
&lt;P&gt;Thanks for reply. This approach is not suitable cause i have more than 1000 source files to process so i need to divide number of records in job itself in batches and process them one by one.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks,&lt;/P&gt; 
&lt;P&gt;Bhushan.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 10:31:07 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292879#M65909</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-02-27T10:31:07Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292880#M65910</link>
      <description>&lt;P&gt;Well you have a couple of choices, you can either do it all in memory (if you have enough) or you can do as&amp;nbsp;&lt;A href="https://community.qlik.com/s/profile/0053p000007LKj7AAG"&gt;@TRF&lt;/A&gt;&amp;nbsp;suggested and output to a file or a database table (the database table would be my first choice). I'm assuming that in-memory is your preferred choice. In which case you will need to use some Java. I have some which I have used in the past which will definitely work, but you will need to understand Java&amp;nbsp;and the tJavaFlex component to use it.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;First of all, take a look at the Java code for the Job. Your components are linked by "rows". Each "row" has a rowStruct class. This is useful to you. If your row is called "row1", your rowStruct class will be "row1Struct". You can use this class to store your rows in an ArrayList. If you want to batch your rows up you can use a combination of ArrayLists inside a HashMap. The code below shows how I would write my data in batches of 10 to a tJavaFlex......&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Start Code&lt;/P&gt; 
&lt;PRE&gt;//Create your HashMap
java.util.HashMap&amp;lt;Integer,java.util.ArrayList&amp;lt;row1Struct&amp;gt;&amp;gt; map = new java.util.HashMap&amp;lt;Integer,java.util.ArrayList&amp;lt;row1Struct&amp;gt;&amp;gt;();

//Create a rowCount variable and a currentBatch variable   
int rowCount = 0; 
int currentBatch = 0;  

//Create your first instance of the your array to hold your first batch of rows
java.util.ArrayList&amp;lt;row1Struct&amp;gt; array = new java.util.ArrayList&amp;lt;row1Struct&amp;gt;();

&lt;/PRE&gt; 
&lt;P&gt;Main Code&lt;/P&gt; 
&lt;PRE&gt;//If the rowCount is a multiple of 10, create a new array and increment the current batch
if(rowCount%10==0 &amp;amp;&amp;amp; rowCount!=0){
	map.put(Integer.valueOf(currentBatch), (java.util.ArrayList&amp;lt;row1Struct&amp;gt;)array.clone());
	currentBatch++;	
	array = new java.util.ArrayList&amp;lt;row1Struct&amp;gt;();
}

//For each row increment the rowCount
rowCount++;

//Important - Create a new row1Struct object and 
//copy your row data to it
row1Struct tmpRow = new row1Struct();
tmpRow.newColumn = row1.newColumn;
tmpRow.newColumn1 = row1.newColumn1;

//Add your tmpRow to the array
array.add(tmpRow);&lt;/PRE&gt; 
&lt;P&gt;End Code&lt;/P&gt; 
&lt;PRE&gt;//At the end, catch any array that hasn't already been added the HashMap (map)  
map.put(Integer.valueOf(currentBatch),array);   

//Add the map to the globalMap to be used later  
globalMap.put("map", map); &lt;/PRE&gt; 
&lt;P&gt;That will store your data in batches of 10 records. To retrieve them by batch, use another tJavaFlex like below....&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Start Code&lt;/P&gt; 
&lt;PRE&gt;//Create your HashMap object and set to be what is contained in your globalMap
java.util.HashMap&amp;lt;Integer,java.util.ArrayList&amp;lt;row1Struct&amp;gt;&amp;gt; map =  (java.util.HashMap&amp;lt;Integer,java.util.ArrayList&amp;lt;row1Struct&amp;gt;&amp;gt;)globalMap.get("map");

//Retrieve a batch from the HashMap. YOU WILL NEED TO MODIFY THIS TO SUIT YOUR REQUIREMENT. I have hard coded it to only batch 0
java.util.ArrayList&amp;lt;row1Struct&amp;gt; array = (java.util.ArrayList&amp;lt;row1Struct&amp;gt;)map.get(0);

//Create an iterator to iterate over the batch
java.util.Iterator&amp;lt;row1Struct&amp;gt; it = array.iterator();

//Start a While loop
while(it.hasNext()){&lt;/PRE&gt; 
&lt;P&gt;Main Code&lt;/P&gt; 
&lt;PRE&gt;//Here I am simply printing a column value from the row, but you can treat data returned here as in any other Talend data.

System.out.println(it.next().newColumn);&lt;/PRE&gt; 
&lt;P&gt;End Code&lt;/P&gt; 
&lt;PRE&gt;//Here we simply close the while loop.
}&lt;/PRE&gt; 
&lt;P&gt;Obviously this takes a little bit of code, but it does give you the exact control you want. The other thing to think about is the memory consumed. However this should easily manage 30000 rows.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 12:19:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292880#M65910</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-02-27T12:19:57Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292881#M65911</link>
      <description>&lt;P&gt;I can't understand why this approach is not suitable.&lt;/P&gt; 
&lt;P&gt;If you have 1000 input files to proceed, just add a tFileList before the tFileInputXML and that's all.&lt;/P&gt; 
&lt;P&gt;Each file will be divided into 1000 records chunks, then each chunk will be proceed (maybe loaded into a database or anything else depending of what you have to do).&lt;/P&gt; 
&lt;P&gt;That's a very common design when you want to deal with a limited and controllable number on records.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 12:28:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292881#M65911</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2018-02-27T12:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292882#M65912</link>
      <description>Did this help?&lt;BR /&gt;If so, thank's to mark your case as solved.</description>
      <pubDate>Thu, 01 Mar 2018 15:42:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292882#M65912</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2018-03-01T15:42:23Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292883#M65913</link>
      <description>&lt;P&gt;Hi TRF,&lt;/P&gt;
&lt;P&gt;Sorry for the late reply. I have not tried given solution cause of urgent deliverables. I will let you know when i will try this solution.&lt;/P&gt;
&lt;P&gt;Thanks for the answer mark.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Bhushan&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Mar 2018 04:33:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292883#M65913</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-03-05T04:33:47Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292884#M65914</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;i want to create a talend job&amp;nbsp; where&amp;nbsp; my database table has 100,00000 plus records and&amp;nbsp; i want to load all the records to a file .&lt;/P&gt;&lt;P&gt;below approach takes 5-6 hours.&lt;/P&gt;&lt;P&gt;&amp;nbsp;toracleinput--&amp;gt;tfileinputdelimited&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can anybody help me to load the data faster?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can i run the job to load 100 or 1000&amp;nbsp; rows at a time so that it will be loaded&amp;nbsp; fast? i have also used tflowto iterate --&amp;gt;tfixedflowinput and configured to iterate 100&amp;nbsp;executions &amp;nbsp;but&amp;nbsp; the job is running very slow after a certain time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Oct 2018 15:32:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292884#M65914</guid>
      <dc:creator>sunny3</dc:creator>
      <dc:date>2018-10-10T15:32:19Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292885#M65915</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;A href="https://community.qlik.com/s/profile/005390000069RuGAAU"&gt;@rhall&lt;/A&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I read your code, and following is my understanding:&lt;/P&gt; 
&lt;P&gt;- The first tJavaFlex is to split the data into many map item ( each map item contain Interger value and an ArrayList)&lt;/P&gt; 
&lt;P&gt;- The second tJavaFlex is to consume the map item list&lt;/P&gt; 
&lt;P&gt;But how to process the map item list &lt;STRONG&gt;one by one&lt;/STRONG&gt;?&lt;/P&gt; 
&lt;P&gt;Is that the link from tJavaFlex 1 to the tJavaFlex2 is Iterate? ( mean tJavaFlex1 ---Iterate---&amp;gt; tJavaFlex2)?&lt;/P&gt; 
&lt;P&gt;how can we link them to next step of processing one by one?&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2019 03:38:54 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292885#M65915</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-12-20T03:38:54Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292886#M65916</link>
      <description>&lt;P&gt;You basically have the idea. In order to release the data (each ArrayList from the tJavaFlex) you will need to list the HashMap keys and iterate over them, passing the key to the second tJavaFlex. So for each iteration you will release all of the ArrayList values in blocks of however mean you grouped then by.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2019 10:57:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292886#M65916</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-12-20T10:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292887#M65917</link>
      <description>hi 
&lt;A href="https://community.qlik.com/s/profile/null"&gt;@rhall&lt;/A&gt; 
&lt;BR /&gt;so, how can we return segments one by one in the second tJavaFlex? 
&lt;BR /&gt;Currently I see that each row input from first tJavaFlex will go directly to the input of the second tJavaFlex</description>
      <pubDate>Mon, 23 Dec 2019 10:23:17 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292887#M65917</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-12-23T10:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Batch processing in talend job.</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292888#M65918</link>
      <description>&lt;P&gt;Sorry about the late reply&amp;nbsp;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;. I hope you had a great new year.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;In answer to your question, all you need to do is come up with a way of iterating over the different batches. This will very much depend upon what you are trying to achieve. But let's say you are simply using a tFlowToIterate component to link to the second tJavaFlex which releases the rows per batch. If you take a look at this code (taken from the example above) you will see I hardcoded it to retrieve only batch 0. Change this to use a value set in the globalMap by the tFlowToIterate and that solves your problem.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;PRE&gt;//Retrieve a batch from the HashMap. YOU WILL NEED TO MODIFY THIS TO SUIT YOUR REQUIREMENT. I have hard coded it to only batch 0
java.util.ArrayList&amp;lt;row1Struct&amp;gt; array = (java.util.ArrayList&amp;lt;row1Struct&amp;gt;)map.get(0);
&lt;/PRE&gt;</description>
      <pubDate>Wed, 01 Jan 2020 20:32:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Batch-processing-in-talend-job/m-p/2292888#M65918</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-01-01T20:32:19Z</dc:date>
    </item>
  </channel>
</rss>

