5 Replies Latest reply: Apr 28, 2016 7:46 AM by Mikael Johansson RSS

    Transforming single row datasets into table

    Mikael Johansson

      Hi all,

       

      I have a single row data file ordered in sets in the following format: {set1},{set2},...,{setn}

       

      Example including two sets:

      {"eventId":1,"revision":1,"hash":"4836-1","channel":4836,"gametime":"05:29","timePeriod":329,"gameId":4836,"realTime":"20150916190954","time":"1461760398.0554","period":1,"class":"ShotWide","type":"Skott","description":"45 A Falk","extra":[],"action":"message","source":"api","sourceport":"80","team":"DIF","messagetype":"all","actiontype":"new","location":{"x":148,"y":-44},"status":"new"},{"eventId":2,"revision":1,"hash":"4836-2","channel":4836,"gametime":"05:49","timePeriod":349,"gameId":4836,"realTime":"20150916191014","time":"1461760398.0793","period":1,"class":"Shot","type":"Skott","description":"49 R Rakhshani","extra":[],"action":"message","source":"api","sourceport":"80","team":"LHC","messagetype":"all","actiontype":"new","location":{"x":157,"y":105},"status":"new"}

       

      I want to transform the data into the following table format:

      gameideventidgametimeperiodclassdescriptionteamlocation xlocation y
      4836105:291ShotWide45 A FalkDIF148-44
      4836205:491Shot49 R RakhshaniLHC157105

       

       

      Any thoughts how I achieve this for a file including around 150 data sets?

       

      Here's a link to the example file with all the data sets http://www.shl.se/shl-arena/4836/liveevents/0/

       

      Best regards,

      Mikael Johansson

        • Re: Transforming single row datasets into table
          Sunny Talwar

          This can be the first step, try to take it forward from here:

           

          Table:

          LOAD RowNo() as Sort,

            PurgeChar(SubField(NewField, ':', 1), Chr(34)) as Label,

            PurgeChar(SubField(NewField, ':', 2), Chr(34)) as Value;

          LOAD PurgeChar(SubField(Field, ','), '}{') as NewField,

            Field;

          LOAD * Inline [

          Field

          {"eventId":1,"revision":1,"hash":"4836-1","channel":4836,"gametime":"05:29","timePeriod":329,"gameId":4836,"realTime":"20150916190954","time":"1461760398.0554","period":1,"class":"ShotWide","type":"Skott","description":"45 A Falk","extra":[],"action":"message","source":"api","sourceport":"80","team":"DIF","messagetype":"all","actiontype":"new","location":{"x":148,"y":-44},"status":"new"},{"eventId":2,"revision":1,"hash":"4836-2","channel":4836,"gametime":"05:49","timePeriod":349,"gameId":4836,"realTime":"20150916191014","time":"1461760398.0793","period":1,"class":"Shot","type":"Skott","description":"49 R Rakhshani","extra":[],"action":"message","source":"api","sourceport":"80","team":"LHC","messagetype":"all","actiontype":"new","location":{"x":157,"y":105},"status":"new"}

          ] (delimiter is '\t');

           

          FinalTable:

          Generic

          LOAD AutoNumber(RowNo(), Label) as UniqueKey,

            Label,

            Value

          Resident Table

          Order By Label, Sort;

           

          DROP Table Table;

           

          Capture.PNG

          • Re: Transforming single row datasets into table
            Marcus Sommer

            Wouldn't be there categories with multiple values like in location - {"x":148,"y":-44} - and further commas within certain values - I assume the description - then it would be a simple coding like this enough:

             

            Load subfield(C, ':', 1) as Category, subfield(C, ':', 2) as Value, rowno() as RowNo;

            Load subfield(B, ',') as C;

            Load subfield(A, '},{') as B;

            LOAD @1 as A

            FROM http://www.shl.se/shl-arena/4836/liveevents/0/ (txt, codepage is 1252, no labels, delimiter is '\t');

             

            To catch these and maybe further exeptions it's more complicated and needs more efforts and probably a deeper investigation of the data-structure which I couldn't to carry out.

             

            I would probably change the load-strategy from the now used preceeding load to a chain of resident loads in which I would try to identify and to replace - with various string-functions like replace/index/mid/left/right and so on - the special cases before doing the next record-split with a subfield.

             

            If this and maybe some other ideas aren't successful I would change the record-splitting to a while-loop and joining the resulting uncomplete parts based on recno/rowno/iterno together. Before going these (worst case) step I would try to change the source to get for each data part an unique delimiter.

             

            - Marcus