Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Need help in creating the job in the most efficient manner.
---------------------------------------------------------------------------
I have a main table containing 440,000,000 rows with schema similar to this -
Customer ID, Trip ID, Amount, Items Purchased ...
I want to filter rows where Count of trips for a customer id is more than 8.
One way is to use the query - Select * from Main_Table where Customer_ID in (select Customer_ID from Main_Table group by Customer_ID having count(Trip_ID)>=8)
Other way is to create an Aggregated Table with Count_Of_Trips as a column for a given Customer_ID - which simplifies the query -
Select * from Main_Table where Customer_ID in (Select Customer_ID from Aggregated_Table where Count_Of_Trips >=8)
---------------------------------------------------------------------------
Now I take aggregated data in to a tHashOutput component and use it for lookup on the Main Table using tMap Component. This seems time consuming.
What would be the most efficient, minimal time consuming job designs or steps for the above subquery problem?