Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear Experts,
I have two tables - one with sales data and another with item promotions data. There is no distinct unique key between the tables.
For each date in the sales table, I want to pull up the promo details. I am doing this with Intervalmatch() function as explained in another post. Script I have used is given below.
POS_DATA:
LOAD
TrxDate, ITEM_CODE, TRX_DATETIME, QTY_SOLD, SALES_XTAX, STORECODE, STORECODE&'-'&ITEM_CODE as [PT_Link]
FROM [lib://Project/POS.qvd] (qvd);
PROMO_TABLE:
LOAD
PT_Store_Code, PT_Item_Code, PT_Link, PT_Sales_Start_Date, PT_Sales_End_Date, PT_Update_Date, PT_Update_User, PT_User_Type, PT_Promo_Sales_Price, PT_Reg_Cost_Price
FROM [lib://Project/Promo.xlsx]
(ooxml, embedded labels, table is Sheet1);
Final_Data:
IntervalMatch(TrxDate)
Left Join
Load
[PT_Sales_Start_Date],
[PT_Sales_End_Date]
Resident PROMO_TABLE;
Left Join(POS_DATA)
Load Distinct *
Resident PROMO_TABLE;
drop Table PROMO_TABLE;
This works well for a small data set of one month. However, data load fails when I try larger data sets (3 months). The full sales data table (1 year) has several million rows of data and the promo table has approximately 3 million rows of data.
Please advice on the below:
1. Is Intervalmatch() the right approach? If not, what is the alternate?
2. I have created a field named [PT_Link] in both tables by linking STORECODE & ITEM_CODE fields. However, this is not unique as the combination repeats several times in both tables. So, is this field relevant and should it be kept in the table?
Grateful for any help in this regard. Can provide additional info if it helps.
Thanks in advance,
Sudeep
POS_DATA:
LOAD
TrxDate, ITEM_CODE, TRX_DATETIME, QTY_SOLD, SALES_XTAX, STORECODE, STORECODE&'-'&ITEM_CODE as [PT_Link]
FROM [lib://Project/POS.qvd] (qvd);
PROMO_TABLE:
LOAD PT_Sales_Start_Date + iterno()-1 as TrxDate,
PT_Store_Code, PT_Item_Code, PT_Link, PT_Sales_Start_Date, PT_Sales_End_Date, PT_Update_Date, PT_Update_User, PT_User_Type, PT_Promo_Sales_Price, PT_Reg_Cost_Price
FROM [lib://Project/Promo.xlsx]
(ooxml, embedded labels, table is Sheet1)
while PT_Sales_Start_Date + iterno()-1<= PT_Sales_End_Date;
Sudeep,
I guess intervalmatch() is the right way. However, I don't know why you are doing the last join. And yes, you can remove the the unnecessary link fields.
Thank you Treesco for the reply.
Due to the data volume the load script exits with error code 129 (out of memory). Have tried different methods without success. Would you have any suggestions to handle this issue?
Thanks,
Sudeep
POS_DATA:
LOAD
TrxDate, ITEM_CODE, TRX_DATETIME, QTY_SOLD, SALES_XTAX, STORECODE, STORECODE&'-'&ITEM_CODE as [PT_Link]
FROM [lib://Project/POS.qvd] (qvd);
PROMO_TABLE:
LOAD PT_Sales_Start_Date + iterno()-1 as TrxDate,
PT_Store_Code, PT_Item_Code, PT_Link, PT_Sales_Start_Date, PT_Sales_End_Date, PT_Update_Date, PT_Update_User, PT_User_Type, PT_Promo_Sales_Price, PT_Reg_Cost_Price
FROM [lib://Project/Promo.xlsx]
(ooxml, embedded labels, table is Sheet1)
while PT_Sales_Start_Date + iterno()-1<= PT_Sales_End_Date;
Thanks @Kushal_Chawda
The solution works well on my sample data set. I will run it for a larger volume to see the performance.
I believe the "WHERE" in the last statement is a typo and should be "WHILE".
A synthetic key is being created (TrxDate + PT_Link) in the final output. Will this create any issue in the data linkage? If yes, how can this be avoided?