Using Peek and Previous Functions Recursively to Help Transform Data

    Hey everyone, a large problem with unstructured data is that it's, well, unstructured. One of the most common problems we see here at Qualcomm, especially with testing or engineering data is when you have a table of records that tries to merge cells accross rows in Excel. However when it's read in line by line, only the first record receives a value while remainder records are null.

    For example, take the data set:

    DateBuild IDResult
    1/1/200915
    1/2/2009null3
    1/3/2009null4
    1/4/2009null7
    1/5/200928
    1/6/2009null3
    1/7/2009null2
    1/8/200931
    1/9/2009null9

    As you can see above, when the Build field needs to repeat its value becomes null because it was a row span. Only the first row contained the value and the remaining records, until the following build, are null.

    Introducing the peek() function...

    One method to solve this is to load your data into two nested loads. The first load will create a counter field. Basically it will count the records from 1 to n and then reset itself back to 1 when it encounters the next build. The outer load will then take this counter and apply it to the peek function to retrieve the record that countained the last build number. The code is below:

    LOAD

    if(isnull([Build ID]), peek([Build ID],(Counter)*-1), [Build ID]) as [New Build ID],*;

    LOAD

    if(previous([Build ID]) <> [Build ID], 1, peek(Counter) + 1) as Counter,*;

    LOAD

    Date,
    [Build ID],
    Result

    FROM [Data];

     

    Wallah! Once you run the above code you should receive a table that looks like this:

    CounterDateNew Build IDBuild IDResult
    11/1/2009115
    11/2/20091null3
    21/3/20091null4
    31/4/20091null7
    11/5/2009228
    11/6/20092null3
    21/7/20092null2
    11/8/2009331
    11/9/20093null9

    Hope this helps!

    - Steve