<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic tDenormalizing taking too long and too much memory to run in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285512#M59134</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am using the tDenormalizing component to denormalize two columns in 1.3kk rows and it's taking more than 2h to run and it needs 12GB of RAM. I'd like to know what is the complexity of the algorithm and if there's a way to improve the performance for high volumes of data.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 03:21:17 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-11-16T03:21:17Z</dc:date>
    <item>
      <title>tDenormalizing taking too long and too much memory to run</title>
      <link>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285512#M59134</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am using the tDenormalizing component to denormalize two columns in 1.3kk rows and it's taking more than 2h to run and it needs 12GB of RAM. I'd like to know what is the complexity of the algorithm and if there's a way to improve the performance for high volumes of data.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 03:21:17 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285512#M59134</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-11-16T03:21:17Z</dc:date>
    </item>
    <item>
      <title>Re: tDenormalizing taking too long and too much memory to run</title>
      <link>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285513#M59135</link>
      <description>&lt;P&gt;Denormalising needs to keep all of the data in memory while looking over your 1.3 million records to see if any links between those records exist. That is not going to be easy or efficient. Is there a way that you could group the data and chunk it before trying to denormalise each chunk? That would speed this up I am sure.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2020 13:27:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285513#M59135</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-02-05T13:27:15Z</dc:date>
    </item>
    <item>
      <title>Re: tDenormalizing taking too long and too much memory to run</title>
      <link>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285514#M59136</link>
      <description>&lt;P&gt;I ended up separating the portion of the data that needed to be denormalized and it was better. The algorithm seems to have really high complexity, which I think could be improved.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2020 11:53:06 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285514#M59136</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-02-06T11:53:06Z</dc:date>
    </item>
    <item>
      <title>Re: tDenormalizing taking too long and too much memory to run</title>
      <link>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285515#M59137</link>
      <description>&lt;P&gt;Unfortunately the problem requires that every row be potentially linked to every other row or no rows at all. That means that everything has to go into memory. You are essentially dealing with&amp;nbsp;1,690,000,000,000 comparisons with your dataset of 1,300,000 records. I'm not sure that you can avoid that number of comparisons unless you build heuristics into the algorithm that you would only know about if you know the dataset. It's the job of the developer to build in those heuristics.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2020 13:39:39 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/tDenormalizing-taking-too-long-and-too-much-memory-to-run/m-p/2285515#M59137</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-02-06T13:39:39Z</dc:date>
    </item>
  </channel>
</rss>

