I have two tables - one with sales data and another with item promotions data. There is no distinct unique key between the tables.
For each date in the sales table, I want to pull up the promo details. I am doing this with Intervalmatch() function as explained in this link. Script I have used is given below.
POS_DATA: LOAD TrxDate, ITEM_CODE, TRX_DATETIME, QTY_SOLD, SALES_XTAX, STORECODE, STORECODE&'-'&ITEM_CODE as [PT_Link] FROM [lib://Project/POS.qvd] (qvd);
PROMO_TABLE: LOAD PT_Store_Code, PT_Item_Code, PT_Link, PT_Sales_Start_Date, PT_Sales_End_Date, PT_Update_Date, PT_Update_User, PT_User_Type, PT_Promo_Sales_Price, PT_Reg_Cost_Price FROM [lib://Project/Promo.xlsx] (ooxml, embedded labels, table is Sheet1);
Final_Data: IntervalMatch(TrxDate) Left Join Load [PT_Sales_Start_Date], [PT_Sales_End_Date] Resident PROMO_TABLE; Left Join(POS_DATA) Load Distinct * Resident PROMO_TABLE; drop Table PROMO_TABLE;
This works well for a small data set of one month. However, data load fails when I try larger data sets (3 months). The full sales data table (1 year) has several million rows of data and the promo table has approximately 3 million rows of data.
Please advice on the below:
1. Is Intervalmatch() the right approach? If not, what is the alternate?
2. I have created a field named [PT_Link] in both tables by linking STORECODE & ITEM_CODE fields. However, this is not unique as the combination repeats several times in both tables. So, is this field relevant and should it be kept in the table?
Grateful for any help in this regard. Can provide additional info if it helps.