<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Subquery - using where &amp;lt;column_name&amp;gt; in (select &amp;lt;column_name&amp;gt; from... in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Subquery-using-where-lt-column-name-gt-in-select-lt-column-name/m-p/2359262#M124117</link>
    <description>Hello&amp;nbsp; 
&lt;BR /&gt;I would suggest you not to cache the data in memory with tHashOutput for a large of data set, this way consumes much memory and reduce the performance. Filter the rows directly in the query will be efficient. 
&lt;BR /&gt; 
&lt;BR /&gt;Regards 
&lt;BR /&gt;Shong</description>
    <pubDate>Tue, 26 Apr 2016 11:36:01 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2016-04-26T11:36:01Z</dc:date>
    <item>
      <title>Subquery - using where &lt;column_name&gt; in (select &lt;column_name&gt; from...</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Subquery-using-where-lt-column-name-gt-in-select-lt-column-name/m-p/2359261#M124116</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;Need help in creating the job in the most efficient manner.&amp;nbsp;&lt;BR /&gt;---------------------------------------------------------------------------&lt;BR /&gt;I have a main table containing 440,000,000 rows with schema similar to this -&amp;nbsp;&lt;BR /&gt;Customer ID, Trip ID, Amount, Items Purchased ...&lt;BR /&gt;I want to filter rows where Count of trips for a customer id is more than 8.&amp;nbsp;&lt;BR /&gt;One way is to use the query - Select * from Main_Table where Customer_ID in (select Customer_ID from Main_Table group by Customer_ID having count(Trip_ID)&amp;gt;=8)&lt;BR /&gt;Other way is to create an Aggregated Table with Count_Of_Trips as a column for a given Customer_ID - which simplifies the query -&amp;nbsp;&lt;BR /&gt;Select * from Main_Table where Customer_ID in (Select Customer_ID from Aggregated_Table where Count_Of_Trips &amp;gt;=8)&lt;BR /&gt;---------------------------------------------------------------------------&lt;BR /&gt;Now I take aggregated data in to a tHashOutput component and use it for lookup on the Main Table using tMap Component. This seems time consuming.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;What would be the most efficient, minimal time consuming job designs or steps for the above subquery problem?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Apr 2016 08:33:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Subquery-using-where-lt-column-name-gt-in-select-lt-column-name/m-p/2359261#M124116</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-04-13T08:33:26Z</dc:date>
    </item>
    <item>
      <title>Re: Subquery - using where &lt;column_name&gt; in (select &lt;column_name&gt; from...</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Subquery-using-where-lt-column-name-gt-in-select-lt-column-name/m-p/2359262#M124117</link>
      <description>Hello&amp;nbsp; 
&lt;BR /&gt;I would suggest you not to cache the data in memory with tHashOutput for a large of data set, this way consumes much memory and reduce the performance. Filter the rows directly in the query will be efficient. 
&lt;BR /&gt; 
&lt;BR /&gt;Regards 
&lt;BR /&gt;Shong</description>
      <pubDate>Tue, 26 Apr 2016 11:36:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Subquery-using-where-lt-column-name-gt-in-select-lt-column-name/m-p/2359262#M124117</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2016-04-26T11:36:01Z</dc:date>
    </item>
  </channel>
</rss>

