3 Replies Latest reply: Feb 13, 2015 5:33 AM by Philipp Wesseling RSS

    Sort, parse and identify

    Philipp Wesseling

      Hi all,

       

      i have the following (simplified) data set:

       

      LogInIDUserIDLogInDateLeftPostAnsweredPost
      1AA2015-01-1010
      2AA2015-01-1210
      3BB2015-01-1201
      4AA2015-01-1701
      5BB2015-01-2211
      6CC2015-01-2201

       

      What i would like to do for every UserID:

       

      • Add a flag "HasAnswerAttachedToPost" = 1 to every record in my data source that tells me if a LogIn has either
        • a LeftPost = 1 AND an AnsweredPost=1
        • OR
        • a LogIn L1 that has a LeftPost=1 AND an AnsweredPost=0 that is followed by a LogIn L2 that has LeftPost=0 AND AnsweredPost=1 within a timeframe of max 10 days after L1

       

      This would result in:

       

      LogInIDUserIDLogInDateLeftPostAnsweredPostHasAnswerAttachedToPost
      1AA2015-01-1010
      2AA2015-01-1210
      3BB2015-01-1201
      4AA2015-01-17011
      5BB2015-01-22111
      6CC2015-01-2201

       

       

      How can this be done in a smart way ?

       

      Thanks!

        • Re: Sort, parse and identify
          Gysbert Wassenaar

          Maybe like this:

           

          Temp:
          LOAD * INLINE [
              LogInID, UserID, LogInDate, LeftPost, AnsweredPost
              1, AA, 2015-01-10, 1, 0
              2, AA, 2015-01-12, 1, 0
              3, BB, 2015-01-12, 0, 1
              4, AA, 2015-01-17, 0, 1
              5, BB, 2015-01-22, 1, 1
              6, CC, 2015-01-22, 0, 1
          ];
          
          NOCONCATENATE
          
          Result:
          LOAD *, if(AnsweredPost=1 and ( LeftPost=1 or (LeftPost=0 and LogInDate - LastLeftPost <=10)),1,0) as HasAnswerAttachedToPost;
          LOAD *, if(LeftPost=1,LogInDate, if(UserID=previous(UserID), peek(LastLeftPost))) as LastLeftPost
          RESIDENT Temp
          ORDER BY UserID, LogInDate
          ;
          
          DROP TABLE Temp;
          
            • Re: Sort, parse and identify
              Philipp Wesseling

              Thank you very much Mr. Wassenaar!

               

              After reviewing your code and trying it on my "real" data, i stumbled over two more challanges that i am struggeling with. If you take a look at this (closer to the real) data sample:

               

              LogInIDUserIDLogInDateLeftPostAnsweredPostHasUploadedContent
              1AA2015-01-10100
              2AA2015-01-12100
              3BB2015-01-12100
              4AA2015-01-17011
              5AA2015-01-18011
              6BB2015-01-12100
              7BB2015-01-15011

               

               

              What i would like to achieve is:

               

              For every LogIn of a user for one day where the user has left a post, i would like to determine if this user has also answered a post and / or uploaded content either on the day of the post or within 10 days after the post. (if these login records have a LeftPost flag = 0 - in these cases i consider the original post(s) to have an answer attached and / or an upload attached.

               

              Following the example above the my result data set would look like this:

               

               

              LogInIDUserIDLogInDateLeftPostAnsweredPostHasAnswerAttachedToPostHasUploadedContentHasContentAttachedToPost
              1AA2015-01-1010100
              2AA2015-01-1010101
              3BB2015-01-1210100
              4AA2015-01-1701Null()0Null()
              5AA2015-01-2101Null()1Null()
              6BB2015-01-1210111
              7BB2015-01-1501Null()1Null()

               

              I guess, the difficulties i have in particular are:

               

              - Set the two new fields HasAnswerAttachedToPost and HasContentAttachedToPost properly in case a user has two or more LogInRecords with LeftPost=1, but AnsweredPost=0 and HasUploadedContent=0 on the same day. In that case i need to consider all of these records when checking for following LogIns of this user for potential Answer and / or Upload attachments where LeftPost = 0.

               

              Hope, my problem is clear & thanks a lot in advance for any help, hints or suggestions!

            • Re: Sort, parse and identify

              Here you go - Have a look at the following load script

               

              USERDATA:
              LOAD * INLINE [
              LogInID, UserID, LogInDate, LeftPost, AnsweredPost
              1, AA, 2015-01-10, 1, 0
              2, AA, 2015-01-12, 1, 0
              3, BB, 2015-01-12, 0, 1
              4, AA, 2015-01-17, 0, 1
              5, BB, 2015-01-22, 1, 1
              6, CC, 2015-01-22, 0, 1
              ]
              ;

              NoConcatenate
              RESULT:
              Load
              LogInID,
              UserID,
              LogInDate,
              LeftPost,
              AnsweredPost,
              if(AnsweredPost=1 and LeftPost=0,1) as HasAnsweredAttachedToPost
              Resident USERDATA;

              Drop Table
              USERDATA;

               

              Have Fun!!