Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Subquery - using where <column_name> in (select <column_name> from...

Hi,
Need help in creating the job in the most efficient manner. 
---------------------------------------------------------------------------
I have a main table containing 440,000,000 rows with schema similar to this - 
Customer ID, Trip ID, Amount, Items Purchased ...
I want to filter rows where Count of trips for a customer id is more than 8. 
One way is to use the query - Select * from Main_Table where Customer_ID in (select Customer_ID from Main_Table group by Customer_ID having count(Trip_ID)>=8)
Other way is to create an Aggregated Table with Count_Of_Trips as a column for a given Customer_ID - which simplifies the query - 
Select * from Main_Table where Customer_ID in (Select Customer_ID from Aggregated_Table where Count_Of_Trips >=8)
---------------------------------------------------------------------------
Now I take aggregated data in to a tHashOutput component and use it for lookup on the Main Table using tMap Component. This seems time consuming. 

What would be the most efficient, minimal time consuming job designs or steps for the above subquery problem?

Labels (2)
1 Reply
Anonymous
Not applicable
Author

Hello 
I would suggest you not to cache the data in memory with tHashOutput for a large of data set, this way consumes much memory and reduce the performance. Filter the rows directly in the query will be efficient.

Regards
Shong