Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey Guys -
I am working on something where I know I can get it done through work around, but was wondering if there is a better way to do it. Please note that this is just a sample and the actual data is huge + the script needs to be dynamic to handle the requirement.
Sample Data
Table:
LOAD * Inline [
ID, FIELD1, MONTHYEAR, VALUE
1, ABC, 11/01/2015, 10
1, ABC, 12/01/2015, 11
1, ABC, 01/01/2016, 12
1, ABC, 02/01/2016, 11
1, ABC, 03/01/2016, 12
1, DEF, 11/01/2015, 5
1, DEF, 12/01/2015, 4
1, GHI, 11/01/2015, 8
1, GHI, 12/01/2015, 9
2, ABC, 11/01/2015, 10
2, ABC, 12/01/2015, 11
2, ABC, 01/01/2016, 12
2, ABC, 02/01/2016, 11
2, ABC, 03/01/2016, 12
2, ABC, 04/01/2016, 16
2, DEF, 11/01/2015, 5
2, DEF, 12/01/2015, 4
2, GHI, 11/01/2015, 8
2, GHI, 12/01/2015, 9
];
I am attaching the expected output file with the sample application.
Update: Can you guys have a look please swuehl, MarcoWedel, jagan MRKachhiaIMP jontydkpi cleveranjos
Maybe like this:
SET DATEFORMAT = 'MM/DD/YYYY';
Table:
LOAD * Inline [
ID, FIELD1, MONTHYEAR, VALUE
1, ABC, 11/01/2015, 10
1, ABC, 12/01/2015, 11
1, ABC, 01/01/2016, 12
1, ABC, 02/01/2016, 11
1, ABC, 03/01/2016, 12
1, DEF, 11/01/2015, 5
1, DEF, 12/01/2015, 4
1, GHI, 11/01/2015, 8
1, GHI, 12/01/2015, 9
2, ABC, 11/01/2015, 10
2, ABC, 12/01/2015, 11
2, ABC, 01/01/2016, 12
2, ABC, 02/01/2016, 11
2, ABC, 03/01/2016, 12
2, ABC, 04/01/2016, 16
2, DEF, 11/01/2015, 5
2, DEF, 12/01/2015, 4
2, GHI, 11/01/2015, 8
2, GHI, 12/01/2015, 9
];
JOIN (Table)
LOAD ID, FIELD1, Max(MONTHYEAR) as MaxIDField
Resident Table
GROUP BY ID, FIELD1;
JOIN (Table)
LOAD ID, Max(MONTHYEAR) as MaxID
Resident Table
GROUP BY ID;
CONCATENATE (Table)
LOAD ID, FIELD1, Date(AddMonths(MONTHYEAR,iterno())) as MONTHYEAR, VALUE, MaxIDField, MaxID
Resident Table
WHILE AddMonths(MONTHYEAR,iterno()) <= MaxID and MONTHYEAR = MaxIDField;
// DROP FIELDS MaxIDField, MaxID;
Hi Sunny, maybe you have already think (and discarted) this option as a workaround:
1- Create a Key with ID, FIELD1 and MONTHYEAR in your original table
2 - Create a temp table with the VALUE of the max MONTHYEAR for each ID and FIELD1
3 - Create a 2nd temp table with all the possible relations between ID, FIELD1 and MONTHYEAR... This is the one that can cause performance issues for huge data, maybe you can limit combinations if you can be sure that, if an ID & FIELD1 has value for a date, it has values in previous dates.
4 - Join to this table the values from the first temp table using ID and FIELD1
5 - Concatenate the result table to the first table checking that the key doesn't exists.
Ruben thanks for your response.
I did have a similar idea, but like you said it could be a performance killer. I am looking for an efficient way to do this if possible.
Maybe like this:
SET DATEFORMAT = 'MM/DD/YYYY';
Table:
LOAD * Inline [
ID, FIELD1, MONTHYEAR, VALUE
1, ABC, 11/01/2015, 10
1, ABC, 12/01/2015, 11
1, ABC, 01/01/2016, 12
1, ABC, 02/01/2016, 11
1, ABC, 03/01/2016, 12
1, DEF, 11/01/2015, 5
1, DEF, 12/01/2015, 4
1, GHI, 11/01/2015, 8
1, GHI, 12/01/2015, 9
2, ABC, 11/01/2015, 10
2, ABC, 12/01/2015, 11
2, ABC, 01/01/2016, 12
2, ABC, 02/01/2016, 11
2, ABC, 03/01/2016, 12
2, ABC, 04/01/2016, 16
2, DEF, 11/01/2015, 5
2, DEF, 12/01/2015, 4
2, GHI, 11/01/2015, 8
2, GHI, 12/01/2015, 9
];
JOIN (Table)
LOAD ID, FIELD1, Max(MONTHYEAR) as MaxIDField
Resident Table
GROUP BY ID, FIELD1;
JOIN (Table)
LOAD ID, Max(MONTHYEAR) as MaxID
Resident Table
GROUP BY ID;
CONCATENATE (Table)
LOAD ID, FIELD1, Date(AddMonths(MONTHYEAR,iterno())) as MONTHYEAR, VALUE, MaxIDField, MaxID
Resident Table
WHILE AddMonths(MONTHYEAR,iterno()) <= MaxID and MONTHYEAR = MaxIDField;
// DROP FIELDS MaxIDField, MaxID;
Thanks Stefan. Let me implement this in my original script and see how it works. For the sample it did work as desired.
Could perform slightly better using
...
WHILE MONTHYEAR = MaxIDField AND AddMonths(MONTHYEAR,iterno()) <= MaxID ;
It worked like a charm. Thanks Stefan and Ruben for your advice.
Ruben you advice might have worked the same way, but I did not have the script and I was too scared to work on it on my own. I still need to figure out what Stefan's script it doing, but it works
I will probably on the weekend to figure what exactly it did.
Once again, thanks guys.
Best,
Sunny
Based on swuehl script but joining with a temp table in the 2nd join (with less records maybe performance is better)
Note: One field name is changed 'cause I was doing a sample.
My initial proposal was slightly different from swuehl's, his proposal is better and clever so he deserves the 'correct answer'
If Sunny finds some time to test, would be interesting to see some numbers based on his data.
If you are going to test Ruben's solution, I think the second JOIN can be slightly improved by LOADing data from TempTable:
JOIN (TempTable)
LOAD ID, Max(MaxMONTHYEAR) as MaxID
Resident TempTable
GROUP BY ID;
edit: Well, probably doing this before the LEFT JOIN of the complete Table:
TempTable:
LOAD ID, FIELD1, Date(Max(MONTHYEAR)) as MaxMONTHYEAR Resident Table Group By ID, FIELD1;
JOIN (TempTable)
LOAD ID, Max(MaxMONTHYEAR) as MaxID
Resident TempTable
GROUP BY ID;
Left Join (TempTable)
LOAD ID, FIELD1, MONTHYEAR as MaxMONTHYEAR, VALUE Resident Table;