
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need More Efficient "Group By" Load
Hi Everybody,
I'm trying to find a more efficient way to create a table that sums and groups records by a certain key. I have the following data that I want to load:
Order_Number | Product | Date | Order_Quantity | Quantity_Filled |
100001 | Product_1 | 06/01/2012 | 29 | 20 |
100001 | Product_2 | 06/01/2012 | 13 | 13 |
100001 | Product_3 | 06/01/2012 | 14 | 3 |
100002 | Product_1 | 07/10/2012 | 11 | 11 |
100002 | Product_2 | 07/10/2012 | 11 | 11 |
100002 | Product_3 | 07/10/2012 | 17 | 17 |
100003 | Product_1 | 07/12/2012 | 28 | 25 |
100003 | Product_3 | 07/12/2012 | 18 | 0 |
I will only need to look at this data by Product and Month. I don't need it broken down by Order_Number or a specific date.
The only way I know how to do this is to create separate tables, make a Product_Date_Key, create another table to sum the data and group by the key, add and drop tables, etc. This works fine but I'd like to have it be more efficient if possible.
The actual data that I'm using is thousands of products on thousands of orders for the last two years.
Accepted Solutions


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following scripts will work:
Initial_Load:
load *, MonthName(Date) as Month;
LOAD * INLINE [
Order_Number, Product, Date, Order_Quantity, Quantity_Filled
100001, Product_1, 41061, 29, 20
100001, Product_2, 41061, 13, 13
100001, Product_3, 41061, 14, 3
100002, Product_1, 41100, 11, 11
100002, Product_2, 41100, 11, 11
100002, Product_3, 41100, 17, 17
100003, Product_1, 41102, 28, 25
100003, Product_3, 41102, 18, 0
];
Data:
LOAD Month, Product,Sum(Order_Quantity) as OrdQty, Sum(Quantity_Filled) as FilledQty Resident Initial_Load Group By Month, Product;
Drop Table Initial_Load;


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following scripts will work:
Initial_Load:
load *, MonthName(Date) as Month;
LOAD * INLINE [
Order_Number, Product, Date, Order_Quantity, Quantity_Filled
100001, Product_1, 41061, 29, 20
100001, Product_2, 41061, 13, 13
100001, Product_3, 41061, 14, 3
100002, Product_1, 41100, 11, 11
100002, Product_2, 41100, 11, 11
100002, Product_3, 41100, 17, 17
100003, Product_1, 41102, 28, 25
100003, Product_3, 41102, 18, 0
];
Data:
LOAD Month, Product,Sum(Order_Quantity) as OrdQty, Sum(Quantity_Filled) as FilledQty Resident Initial_Load Group By Month, Product;
Drop Table Initial_Load;

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perfect! Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I would do any aggregations on the load script, and I would let QV do them for you.
I uploaded the same file and got to the same results without changing the load script, I think it will work faster and will give you more flexibility in the future
I attach the qvw example
Hope this helps
Regards,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
That's how I was doing it originally but the problem was that I was loading too much data that I didn't need. I only ever need to know the total quantity for a product by month. I never used quantities by order number or by what day of the month so this was slowing the file down as I was loading thousands of different products across thousands of orders.
