<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient Processing and Handling of API Calls for Large Name Dataset in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356867#M122297</link>
    <description>&lt;P&gt;@Lilian Ortiz Costa​&amp;nbsp;, loading 900,000 name records into memory and iterate each name one by one will consume a lot of memory resources. I will suggest to try the following ways:&lt;/P&gt;&lt;P&gt;1- Split the data into a smaller subset, eg 5000 name per file, and then iterate each file.&lt;/P&gt;&lt;P&gt;data source--main--tFileOutputDelimited&lt;/P&gt;&lt;P&gt;|onsubjobok&lt;/P&gt;&lt;P&gt;tFlieList--iterate--&amp;gt;tFileInputDelimited--main--tFlowIterate--iterate--tRest (or tRestClient)--&amp;gt;out--&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//In the advanced settings panel of tFileOutputDelimited, check the 'split output to several files'  box.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2- Allocate more memory to the job execution.&lt;/P&gt;&lt;P&gt;3- Enable parallel execution when using tFlowToIterate to iterate name and call API.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2023 03:04:34 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-08-16T03:04:34Z</dc:date>
    <item>
      <title>Efficient Processing and Handling of API Calls for Large Name Dataset</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356865#M122295</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a dataset containing 900,000 name records that I need to utilize as parameters for individual API GET calls. Unfortunately, the API only allows one name record per call, making bulk processing impractical.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I experimented with a smaller subset of 5,000 names, and it took around 15 minutes to complete the processing. To achieve this, I employed the tFlowToIterate component. This component facilitates the selection of one name at a time, which is then stored in a context variable and subsequently used as an input parameter for the API call.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I were to extend this approach to the entire 900,000-name dataset, the processing time would extend to approximately 60 hours. My goal is to distribute this processing time over the p of 7 days.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Additionally, I am seeking guidance on how to handle potential API failures. It would be beneficial to have a strategy in place to identify the names associated with failed API calls, allowing for their reprocessing at a later time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I appreciate any insights or suggestions you may have regarding an optimized job design for this scenario.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 21:29:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356865#M122295</guid>
      <dc:creator>Artemis_Mercury</dc:creator>
      <dc:date>2024-11-15T21:29:11Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Processing and Handling of API Calls for Large Name Dataset</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356866#M122296</link>
      <description>&lt;P&gt;Hi @Lilian Ortiz Costa​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I suggest you use tRESClient to make the API call and get both Response and Error output rows from it. Then you can use those outputs to identify the source record and update its status. This way you can filter the source dataset at each Job start to get only records that weren't processed successfully in the previous executions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also suggest you keep the "Die on error" option enabled on tRESTClient and use the OnComponentError trigger starting from this same component to identify fatal errors and also update the source record status.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2023 13:31:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356866#M122296</guid>
      <dc:creator>anselmopeixoto</dc:creator>
      <dc:date>2023-08-15T13:31:59Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Processing and Handling of API Calls for Large Name Dataset</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356867#M122297</link>
      <description>&lt;P&gt;@Lilian Ortiz Costa​&amp;nbsp;, loading 900,000 name records into memory and iterate each name one by one will consume a lot of memory resources. I will suggest to try the following ways:&lt;/P&gt;&lt;P&gt;1- Split the data into a smaller subset, eg 5000 name per file, and then iterate each file.&lt;/P&gt;&lt;P&gt;data source--main--tFileOutputDelimited&lt;/P&gt;&lt;P&gt;|onsubjobok&lt;/P&gt;&lt;P&gt;tFlieList--iterate--&amp;gt;tFileInputDelimited--main--tFlowIterate--iterate--tRest (or tRestClient)--&amp;gt;out--&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//In the advanced settings panel of tFileOutputDelimited, check the 'split output to several files'  box.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2- Allocate more memory to the job execution.&lt;/P&gt;&lt;P&gt;3- Enable parallel execution when using tFlowToIterate to iterate name and call API.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2023 03:04:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Efficient-Processing-and-Handling-of-API-Calls-for-Large-Name/m-p/2356867#M122297</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-08-16T03:04:34Z</dc:date>
    </item>
  </channel>
</rss>

