Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
sunny_talwar

Need help with data manipulation

Hey Guys -

I am working on something where I know I can get it done through work around, but was wondering if there is a better way to do it. Please note that this is just a sample and the actual data is huge + the script needs to be dynamic to handle the requirement.

Sample Data

Table:

LOAD * Inline [

ID, FIELD1, MONTHYEAR, VALUE

1, ABC,  11/01/2015, 10

1, ABC, 12/01/2015, 11

1, ABC, 01/01/2016, 12

1, ABC, 02/01/2016, 11

1, ABC, 03/01/2016, 12

1, DEF, 11/01/2015, 5

1, DEF, 12/01/2015, 4

1, GHI, 11/01/2015, 8

1, GHI, 12/01/2015, 9

2, ABC,  11/01/2015, 10

2, ABC, 12/01/2015, 11

2, ABC, 01/01/2016, 12

2, ABC, 02/01/2016, 11

2, ABC, 03/01/2016, 12

2, ABC, 04/01/2016, 16

2, DEF, 11/01/2015, 5

2, DEF, 12/01/2015, 4

2, GHI, 11/01/2015, 8

2, GHI, 12/01/2015, 9

];


I am attaching the expected output file with the sample application.

Update: Can you guys have a look please swuehl‌‌, MarcoWedel‌, jaganMRKachhiaIMPjontydkpicleveranjos


1 Solution

Accepted Solutions
swuehl
MVP
MVP

Maybe like this:

SET DATEFORMAT = 'MM/DD/YYYY';

Table:

LOAD * Inline [

ID, FIELD1, MONTHYEAR, VALUE

1, ABC,  11/01/2015, 10

1, ABC, 12/01/2015, 11

1, ABC, 01/01/2016, 12

1, ABC, 02/01/2016, 11

1, ABC, 03/01/2016, 12

1, DEF, 11/01/2015, 5

1, DEF, 12/01/2015, 4

1, GHI, 11/01/2015, 8

1, GHI, 12/01/2015, 9

2, ABC,  11/01/2015, 10

2, ABC, 12/01/2015, 11

2, ABC, 01/01/2016, 12

2, ABC, 02/01/2016, 11

2, ABC, 03/01/2016, 12

2, ABC, 04/01/2016, 16

2, DEF, 11/01/2015, 5

2, DEF, 12/01/2015, 4

2, GHI, 11/01/2015, 8

2, GHI, 12/01/2015, 9

];

JOIN (Table)

LOAD ID, FIELD1, Max(MONTHYEAR) as MaxIDField

Resident Table

GROUP BY ID, FIELD1;

JOIN (Table)

LOAD ID, Max(MONTHYEAR) as MaxID

Resident Table

GROUP BY ID;

CONCATENATE (Table)

LOAD ID, FIELD1, Date(AddMonths(MONTHYEAR,iterno())) as MONTHYEAR, VALUE, MaxIDField, MaxID

Resident Table

WHILE  AddMonths(MONTHYEAR,iterno()) <= MaxID and MONTHYEAR = MaxIDField;

// DROP FIELDS MaxIDField, MaxID;

View solution in original post

11 Replies
rubenmarin

Hi Sunny, maybe you have already think (and discarted) this option as a workaround:

1- Create a Key with ID, FIELD1 and MONTHYEAR in your original table

2 - Create a temp table with the VALUE of the max MONTHYEAR for each ID and FIELD1

3 - Create a 2nd temp table with all the possible relations between ID, FIELD1 and MONTHYEAR... This is the one that can cause performance issues for huge data, maybe you can limit combinations if you can be sure that, if an ID & FIELD1 has value for a date, it has values in previous dates.

4 - Join to this table the values from the first temp table using ID and FIELD1

5 - Concatenate the result table to the first table checking that the key doesn't exists.

sunny_talwar
Author

Ruben‌ thanks for your response.

I did have a similar idea, but like you said it could be a performance killer. I am looking for an efficient way to do this if possible.

swuehl
MVP
MVP

Maybe like this:

SET DATEFORMAT = 'MM/DD/YYYY';

Table:

LOAD * Inline [

ID, FIELD1, MONTHYEAR, VALUE

1, ABC,  11/01/2015, 10

1, ABC, 12/01/2015, 11

1, ABC, 01/01/2016, 12

1, ABC, 02/01/2016, 11

1, ABC, 03/01/2016, 12

1, DEF, 11/01/2015, 5

1, DEF, 12/01/2015, 4

1, GHI, 11/01/2015, 8

1, GHI, 12/01/2015, 9

2, ABC,  11/01/2015, 10

2, ABC, 12/01/2015, 11

2, ABC, 01/01/2016, 12

2, ABC, 02/01/2016, 11

2, ABC, 03/01/2016, 12

2, ABC, 04/01/2016, 16

2, DEF, 11/01/2015, 5

2, DEF, 12/01/2015, 4

2, GHI, 11/01/2015, 8

2, GHI, 12/01/2015, 9

];

JOIN (Table)

LOAD ID, FIELD1, Max(MONTHYEAR) as MaxIDField

Resident Table

GROUP BY ID, FIELD1;

JOIN (Table)

LOAD ID, Max(MONTHYEAR) as MaxID

Resident Table

GROUP BY ID;

CONCATENATE (Table)

LOAD ID, FIELD1, Date(AddMonths(MONTHYEAR,iterno())) as MONTHYEAR, VALUE, MaxIDField, MaxID

Resident Table

WHILE  AddMonths(MONTHYEAR,iterno()) <= MaxID and MONTHYEAR = MaxIDField;

// DROP FIELDS MaxIDField, MaxID;

sunny_talwar
Author

Thanks Stefan. Let me implement this in my original script and see how it works. For the sample it did work as desired.

swuehl
MVP
MVP

Could perform slightly better using

...

WHILE  MONTHYEAR = MaxIDField  AND AddMonths(MONTHYEAR,iterno()) <= MaxID ;

sunny_talwar
Author

It worked like a charm. Thanks Stefan and Ruben for your advice.

Ruben you advice might have worked the same way, but I did not have the script and I was too scared to work on it on my own. I still need to figure out what Stefan's script it doing, but it works

I will probably on the weekend to figure what exactly it did.

Once again, thanks guys.

Best,

Sunny

rubenmarin

Based on swuehl script but joining with a temp table in the 2nd join (with less records maybe performance is better)

Note: One field name is changed 'cause I was doing a sample.

rubenmarin

My initial proposal was slightly different from swuehl's, his proposal is better and clever so he deserves the 'correct answer'

swuehl
MVP
MVP

If Sunny finds some time to test, would be interesting to see some numbers based on his data.

If you are going to test Ruben's solution, I think the second JOIN can be slightly improved by LOADing data from TempTable:

JOIN (TempTable)

LOAD ID, Max(MaxMONTHYEAR) as MaxID

Resident TempTable

GROUP BY ID;

edit: Well, probably doing this before the LEFT JOIN of the complete Table:

TempTable:

LOAD ID, FIELD1, Date(Max(MONTHYEAR)) as MaxMONTHYEAR Resident Table Group By ID, FIELD1;

JOIN (TempTable)

LOAD ID, Max(MaxMONTHYEAR) as MaxID

Resident TempTable

GROUP BY ID;

Left Join (TempTable)

LOAD ID, FIELD1, MONTHYEAR as MaxMONTHYEAR, VALUE Resident Table;