Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Sort, parse and identify

Hi all,

i have the following (simplified) data set:

LogInIDUserIDLogInDateLeftPostAnsweredPost
1AA2015-01-1010
2AA2015-01-1210
3BB2015-01-1201
4AA2015-01-1701
5BB2015-01-2211
6CC2015-01-2201

What i would like to do for every UserID:

  • Add a flag "HasAnswerAttachedToPost" = 1 to every record in my data source that tells me if a LogIn has either
    • a LeftPost = 1 AND an AnsweredPost=1
    • OR
    • a LogIn L1 that has a LeftPost=1 AND an AnsweredPost=0 that is followed by a LogIn L2 that has LeftPost=0 AND AnsweredPost=1 within a timeframe of max 10 days after L1

This would result in:

LogInIDUserIDLogInDateLeftPostAnsweredPostHasAnswerAttachedToPost
1AA2015-01-1010
2AA2015-01-1210
3BB2015-01-1201
4AA2015-01-17011
5BB2015-01-22111
6CC2015-01-2201

How can this be done in a smart way ?

Thanks!

1 Solution

Accepted Solutions
Gysbert_Wassenaar

Maybe like this:

Temp:

LOAD * INLINE [

    LogInID, UserID, LogInDate, LeftPost, AnsweredPost

    1, AA, 2015-01-10, 1, 0

    2, AA, 2015-01-12, 1, 0

    3, BB, 2015-01-12, 0, 1

    4, AA, 2015-01-17, 0, 1

    5, BB, 2015-01-22, 1, 1

    6, CC, 2015-01-22, 0, 1

];

NOCONCATENATE

Result:

LOAD *, if(AnsweredPost=1 and ( LeftPost=1 or (LeftPost=0 and LogInDate - LastLeftPost <=10)),1,0) as HasAnswerAttachedToPost;

LOAD *, if(LeftPost=1,LogInDate, if(UserID=previous(UserID), peek(LastLeftPost))) as LastLeftPost

RESIDENT Temp

ORDER BY UserID, LogInDate

;

DROP TABLE Temp;


talk is cheap, supply exceeds demand

View solution in original post

3 Replies
Gysbert_Wassenaar

Maybe like this:

Temp:

LOAD * INLINE [

    LogInID, UserID, LogInDate, LeftPost, AnsweredPost

    1, AA, 2015-01-10, 1, 0

    2, AA, 2015-01-12, 1, 0

    3, BB, 2015-01-12, 0, 1

    4, AA, 2015-01-17, 0, 1

    5, BB, 2015-01-22, 1, 1

    6, CC, 2015-01-22, 0, 1

];

NOCONCATENATE

Result:

LOAD *, if(AnsweredPost=1 and ( LeftPost=1 or (LeftPost=0 and LogInDate - LastLeftPost <=10)),1,0) as HasAnswerAttachedToPost;

LOAD *, if(LeftPost=1,LogInDate, if(UserID=previous(UserID), peek(LastLeftPost))) as LastLeftPost

RESIDENT Temp

ORDER BY UserID, LogInDate

;

DROP TABLE Temp;


talk is cheap, supply exceeds demand
Not applicable
Author

Here you go - Have a look at the following load script

USERDATA:
LOAD * INLINE [
LogInID, UserID, LogInDate, LeftPost, AnsweredPost
1, AA, 2015-01-10, 1, 0
2, AA, 2015-01-12, 1, 0
3, BB, 2015-01-12, 0, 1
4, AA, 2015-01-17, 0, 1
5, BB, 2015-01-22, 1, 1
6, CC, 2015-01-22, 0, 1
]
;

NoConcatenate
RESULT:
Load
LogInID,
UserID,
LogInDate,
LeftPost,
AnsweredPost,
if(AnsweredPost=1 and LeftPost=0,1) as HasAnsweredAttachedToPost
Resident USERDATA;

Drop Table
USERDATA;

Have Fun!!

Not applicable
Author

Thank you very much Mr. Wassenaar!

After reviewing your code and trying it on my "real" data, i stumbled over two more challanges that i am struggeling with. If you take a look at this (closer to the real) data sample:

LogInIDUserIDLogInDateLeftPostAnsweredPostHasUploadedContent
1AA2015-01-10100
2AA2015-01-12100
3BB2015-01-12100
4AA2015-01-17011
5AA2015-01-18011
6BB2015-01-12100
7BB2015-01-15011

What i would like to achieve is:

For every LogIn of a user for one day where the user has left a post, i would like to determine if this user has also answered a post and / or uploaded content either on the day of the post or within 10 days after the post. (if these login records have a LeftPost flag = 0 - in these cases i consider the original post(s) to have an answer attached and / or an upload attached.

Following the example above the my result data set would look like this:

LogInIDUserIDLogInDateLeftPostAnsweredPostHasAnswerAttachedToPostHasUploadedContentHasContentAttachedToPost
1AA2015-01-1010100
2AA2015-01-1010101
3BB2015-01-1210100
4AA2015-01-1701Null()0Null()
5AA2015-01-2101Null()1Null()
6BB2015-01-1210111
7BB2015-01-1501Null()1Null()

I guess, the difficulties i have in particular are:

- Set the two new fields HasAnswerAttachedToPost and HasContentAttachedToPost properly in case a user has two or more LogInRecords with LeftPost=1, but AnsweredPost=0 and HasUploadedContent=0 on the same day. In that case i need to consider all of these records when checking for following LogIns of this user for potential Answer and / or Upload attachments where LeftPost = 0.

Hope, my problem is clear & thanks a lot in advance for any help, hints or suggestions!