<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: tDataShuffling - improving performance in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265708#M2099</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Would you mind posting your current job design screenshots on forum which will be helpful for us to understand your work flow?&lt;/P&gt;
&lt;P&gt;Best regards&lt;/P&gt;
&lt;P&gt;Sabrina&lt;/P&gt;</description>
    <pubDate>Wed, 28 Nov 2018 06:31:53 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2018-11-28T06:31:53Z</dc:date>
    <item>
      <title>tDataShuffling - improving performance</title>
      <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265704#M2095</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am using tDataShuffling component to shuffle a column which is 8 char length, partitioned on 1st 3 char of the column. eg.&lt;BR /&gt;&lt;BR /&gt;SELECT field_1, substr(&lt;SPAN&gt;field_1, 1, 2) from table_name;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;shuffle column value: py13456&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;partition column value: py1&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This is running very slow with 3 rows/s.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This table has around 6 million records and the buffer size of the tDataShuffle component is 100000 with Seed generator -&amp;nbsp;12345678.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;At the job level I have set Multi Thread execution with Parallelize Buffer Unit Size -&amp;nbsp;25000&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Kindly suggest the ways to improve the performance of this component.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 14:02:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265704#M2095</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-11-27T14:02:23Z</dc:date>
    </item>
    <item>
      <title>Re: tDataShuffling - improving performance</title>
      <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265705#M2096</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried the below:&lt;/P&gt;&lt;P&gt;Cursor: 100000 for tDBInput&lt;/P&gt;&lt;P&gt;rownum &amp;lt; 100000&lt;BR /&gt;At job level Max heap size to 2048M(Job run JVM Settings)&lt;BR /&gt;&lt;BR /&gt;Is there anything I could do at tDataShuffle component level.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you kindly reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 15:08:09 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265705#M2096</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-11-27T15:08:09Z</dc:date>
    </item>
    <item>
      <title>Re: tDataShuffling - improving performance</title>
      <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265706#M2097</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The Job flow has:&lt;/P&gt;
&lt;P&gt;tDBInput (with cursor ) ----&amp;gt; tDataShuffle -----&amp;gt; tDBOutput (update operation)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Db input component Cursor: 100000&lt;/P&gt;
&lt;P&gt;Db input query Rownum: 100000&lt;/P&gt;
&lt;P&gt;Shuffling Buffer size: 100000&lt;/P&gt;
&lt;P&gt;Job Multi thread Parallelize Buffer Unit Size: 25000&lt;/P&gt;
&lt;P&gt;Job Min heap space: -Xmx1024M&lt;/P&gt;
&lt;P&gt;Job Max heap space: -Xmx4096M&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 15:22:05 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265706#M2097</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-11-27T15:22:05Z</dc:date>
    </item>
    <item>
      <title>Re: tDataShuffling - improving performance</title>
      <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265707#M2098</link>
      <description>&lt;P&gt;I have used&amp;nbsp;Db output Batch size: 50000.&lt;/P&gt;
&lt;P&gt;This job is running for more than 30 mins and have not completed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Would saving data in cache - tHashOutput before tDataShuffle, improve performnace?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 16:23:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265707#M2098</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-11-27T16:23:32Z</dc:date>
    </item>
    <item>
      <title>Re: tDataShuffling - improving performance</title>
      <link>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265708#M2099</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Would you mind posting your current job design screenshots on forum which will be helpful for us to understand your work flow?&lt;/P&gt;
&lt;P&gt;Best regards&lt;/P&gt;
&lt;P&gt;Sabrina&lt;/P&gt;</description>
      <pubDate>Wed, 28 Nov 2018 06:31:53 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/tDataShuffling-improving-performance/m-p/2265708#M2099</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2018-11-28T06:31:53Z</dc:date>
    </item>
  </channel>
</rss>

