<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic large csv file import in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252204#M35913</link>
    <description>&lt;P&gt;I have many large files, each one contains more than 2 Mil rows of .CVS data.&lt;/P&gt;
&lt;P&gt;I need to import gradually into MySQL table because I cannot load data in CVS file into memory.&lt;/P&gt;
&lt;P&gt;Could you guide me which should use to do that?&lt;/P&gt;
&lt;P&gt;I try to use tFileInputDelimited component, but it take out all file data and consume all my lap memory&lt;/P&gt;</description>
    <pubDate>Thu, 19 Sep 2019 11:01:15 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2019-09-19T11:01:15Z</dc:date>
    <item>
      <title>large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252204#M35913</link>
      <description>&lt;P&gt;I have many large files, each one contains more than 2 Mil rows of .CVS data.&lt;/P&gt;
&lt;P&gt;I need to import gradually into MySQL table because I cannot load data in CVS file into memory.&lt;/P&gt;
&lt;P&gt;Could you guide me which should use to do that?&lt;/P&gt;
&lt;P&gt;I try to use tFileInputDelimited component, but it take out all file data and consume all my lap memory&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 11:01:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252204#M35913</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-19T11:01:15Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252205#M35914</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;&amp;nbsp;,if you are getting memory error ,you can increase the JVM by using the below link.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCr1pCAC" target="_blank"&gt;https://community.talend.com/t5/Installing-and-Upgrading/Configure-to-use-a-JVM/td-p/112893&lt;/A&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;you can try to use the tMySqlBulkExec component to improve the performance. to know more about&amp;nbsp;tMySqlBulkExec find the below link.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://help.talend.com/reader/jomWd_GKqAmTZviwG_oxHQ/YhYqawgnulVXpdzE1lJ6cg" target="_blank" rel="nofollow noopener noreferrer"&gt;https://help.talend.com/reader/jomWd_GKqAmTZviwG_oxHQ/YhYqawgnulVXpdzE1lJ6cg&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 11:09:12 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252205#M35914</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2019-09-19T11:09:12Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252206#M35915</link>
      <description>&lt;P&gt;You may also have a loop over your tFileInputDelimited with the Limit parameter setted to the max value your are able to manage at the same time (for example 250,000) and set the Header parameter dynamically based on the loop indice to ignore previously imported rows.&lt;/P&gt; 
&lt;P&gt;You may also split your big file into small chunck files and iterate over the list of created files.&lt;/P&gt; 
&lt;P&gt;If you connect tFileInputFullRow to tFileOutputDelimited set the advanced parameter "Split output in several files" to the number of lines your are able to manage at once, it should be an easier solution.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 12:52:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252206#M35915</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2019-09-19T12:52:42Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252207#M35916</link>
      <description>&lt;P&gt;Hello &lt;SPAN&gt;Manohar&lt;/SPAN&gt;,&lt;/P&gt;&lt;P&gt;I have just start with Talend around 3 hours, could you please tell me which one should I use between TOS_DI or TOS_BD for my case?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 02:56:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252207#M35916</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T02:56:00Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252208#M35917</link>
      <description>&lt;P&gt;Dear TFR,&lt;/P&gt;&lt;P&gt;It seem your suggestion an very appropriate to my case.&lt;/P&gt;&lt;P&gt;Could you guide me some steps to do that? I am very new to TOS&lt;/P&gt;&lt;P&gt;Great thanks&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 03:17:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252208#M35917</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T03:17:10Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252209#M35918</link>
      <description>&lt;P&gt;Dear TRF&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Screenshot is what I am trying to do (&lt;SPAN&gt;connect tFileInputFullRow to tFileOutputDelimited)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But how to config the&amp;nbsp;tFileInputFullRow because it require the File Name?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Lwei"&gt;Capture1.PNG&lt;/A&gt;</description>
      <pubDate>Fri, 20 Sep 2019 07:59:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252209#M35918</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T07:59:33Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252210#M35919</link>
      <description>&lt;P&gt;The job design should be like this:&lt;/P&gt;
&lt;P&gt;tFileInputFullRow --&amp;gt; tFileOutputDelimited (just to create small files - set schema to a single field)&lt;/P&gt;
&lt;P&gt;|&lt;/P&gt;
&lt;P&gt;onSubJobOK&lt;/P&gt;
&lt;P&gt;|&lt;/P&gt;
&lt;P&gt;tFileList --&amp;gt; tFileInputDelimited (with the real schema) --&amp;gt; tMap --&amp;gt; IHSDatabase&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 08:13:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252210#M35919</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2019-09-20T08:13:32Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252211#M35920</link>
      <description>&lt;P&gt;Dear TRF&lt;/P&gt; 
&lt;P&gt;I got the OutOfMemmories error message&lt;/P&gt; 
&lt;P&gt;I still wonder tFileInputFullRow should take row by row in the tFileInputDelimited should be more sense. Am I right?&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture1.PNG" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7Og.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/149508i72CF8C8A346A1E5F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7Og.png" alt="0683p000009M7Og.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Lwen"&gt;Capture1.PNG&lt;/A&gt;</description>
      <pubDate>Fri, 20 Sep 2019 08:24:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252211#M35920</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T08:24:58Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252212#M35921</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;&amp;nbsp;,increase the JVM and see.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;follow the below link to set JVM.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCr1pCAC" target="_blank" rel="noopener"&gt;https://community.talend.com/t5/Installing-and-Upgrading/Configure-to-use-a-JVM/td-p/112893&lt;/A&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 08:35:07 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252212#M35921</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2019-09-20T08:35:07Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252213#M35922</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Manohar&amp;nbsp;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am using TOS 64 bit and I have no problem with Java as in your link&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Still confuse&amp;nbsp;on what you mentioned about&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 09:59:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252213#M35922</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T09:59:21Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252214#M35923</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;&amp;nbsp;,check the below link .&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://community.qlik.com/s/article/ka03p0000006EqxAAE" target="_blank"&gt;https://community.talend.com/t5/Design-and-Development/Memory-issues-when-profiling-large-data-sets-using-indicators/ta-p/24132&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2019 10:25:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252214#M35923</guid>
      <dc:creator>manodwhb</dc:creator>
      <dc:date>2019-09-20T10:25:10Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252215#M35924</link>
      <description>Hi Manohar, 
&lt;BR /&gt;Thanks for your suggestion. I try it now. 
&lt;BR /&gt;But do we have a solution for increasing import as TRF mentioned above? Mean that we don't load all rows in file to RAM, instead we insert a number of rows then continue loading other rows.</description>
      <pubDate>Fri, 20 Sep 2019 11:35:21 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252215#M35924</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-20T11:35:21Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252216#M35925</link>
      <description>&lt;P&gt;&lt;A href="https://community.qlik.com/s/profile/0053p000007LQMaAAO"&gt;@phancongphuoc&lt;/A&gt;&amp;nbsp;your job design doesn't make sense.&lt;/P&gt; 
&lt;P&gt;Here is what I suggest to you:&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="job.png" style="width: 644px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7TV.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/153095i64A5F90C0729265A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7TV.png" alt="0683p000009M7TV.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;With 1rst subjob, you'll split your big file into smallest CSV file, thanks to the "Split output in several files" option.&lt;/P&gt; 
&lt;P&gt;For example, you can generate files of 100,000 records.&lt;/P&gt; 
&lt;P&gt;tFileInputFullRow consider the input file with 1 single field called "line". Use the same for the tFileOutputDelimited and don't include header for the output files.&lt;/P&gt; 
&lt;P&gt;For the 2nd subjob, use tFileList component to iterate over the list of previously generated CSV files.&lt;/P&gt; 
&lt;P&gt;tFileInputDelimited let you read each CSV file one by one with the desired schema.&lt;/P&gt; 
&lt;P&gt;You have to use the following expression for the filename:&lt;/P&gt; 
&lt;PRE&gt;((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))&lt;/PRE&gt; 
&lt;P&gt;The content of the current file is pushed to the database by the tMysqlOutput component.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Sep 2019 17:09:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252216#M35925</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2019-09-21T17:09:34Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252217#M35926</link>
      <description>&lt;P&gt;Hi TRF,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I try your approach, but it seem that TFileInputFullRow component read all rows in the file before it seperate into several files.&lt;/P&gt; 
&lt;P&gt;This is mean that it is load all the file content into RAM&lt;/P&gt; 
&lt;P&gt;So that, I still got the error of "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space " in even that first step of seperating into serveral files&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.PNG" style="width: 945px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7dF.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/153226i173D269A6D8D99E2/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7dF.png" alt="0683p000009M7dF.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;I also try to increase the JAVA heap to &amp;gt; 3GH&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture2.PNG" style="width: 526px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009M7dK.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/139188iD5D608967A434B3B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009M7dK.png" alt="0683p000009M7dK.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;All those thing cannot be done&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Sep 2019 05:17:22 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252217#M35926</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2019-09-22T05:17:22Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252218#M35927</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;If you connect tFileInputFullRow to tFileOutputDelimited set the advanced parameter "Split output in several files" to the number of lines your are able to manage at once, it should be an easier solution."&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;Thanks! Your suggestion was useful in my case. Do you know of any limitation or shortcoming of this process?&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;What I am doing is reading a big XML (1GB) as a flat file, updating it and saving it with the same name. I was able to achieve this using the attached flow. But not aware, if there is any shortcoming , that I might face in future due to this.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009LyGY"&gt;Read_Big_XML.png&lt;/A&gt;</description>
      <pubDate>Thu, 02 Apr 2020 22:01:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252218#M35927</guid>
      <dc:creator>Tarun2</dc:creator>
      <dc:date>2020-04-02T22:01:30Z</dc:date>
    </item>
    <item>
      <title>Re: large csv file import</title>
      <link>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252219#M35928</link>
      <description>&lt;P&gt;Update to the previous comment. Only 350MB file processed successfully with that approach. The 1GB file failed with Java Heap error. It processed only after bumping the JVM to 12GB. So, this is not an ideal solution.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2020 23:21:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/large-csv-file-import/m-p/2252219#M35928</guid>
      <dc:creator>Tarun2</dc:creator>
      <dc:date>2020-04-02T23:21:04Z</dc:date>
    </item>
  </channel>
</rss>

