Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
i have the following (simplified) data set:
LogInID | UserID | LogInDate | LeftPost | AnsweredPost |
---|---|---|---|---|
1 | AA | 2015-01-10 | 1 | 0 |
2 | AA | 2015-01-12 | 1 | 0 |
3 | BB | 2015-01-12 | 0 | 1 |
4 | AA | 2015-01-17 | 0 | 1 |
5 | BB | 2015-01-22 | 1 | 1 |
6 | CC | 2015-01-22 | 0 | 1 |
What i would like to do for every UserID:
This would result in:
LogInID | UserID | LogInDate | LeftPost | AnsweredPost | HasAnswerAttachedToPost |
---|---|---|---|---|---|
1 | AA | 2015-01-10 | 1 | 0 | |
2 | AA | 2015-01-12 | 1 | 0 | |
3 | BB | 2015-01-12 | 0 | 1 | |
4 | AA | 2015-01-17 | 0 | 1 | 1 |
5 | BB | 2015-01-22 | 1 | 1 | 1 |
6 | CC | 2015-01-22 | 0 | 1 |
How can this be done in a smart way ?
Thanks!
Maybe like this:
Temp:
LOAD * INLINE [
LogInID, UserID, LogInDate, LeftPost, AnsweredPost
1, AA, 2015-01-10, 1, 0
2, AA, 2015-01-12, 1, 0
3, BB, 2015-01-12, 0, 1
4, AA, 2015-01-17, 0, 1
5, BB, 2015-01-22, 1, 1
6, CC, 2015-01-22, 0, 1
];
NOCONCATENATE
Result:
LOAD *, if(AnsweredPost=1 and ( LeftPost=1 or (LeftPost=0 and LogInDate - LastLeftPost <=10)),1,0) as HasAnswerAttachedToPost;
LOAD *, if(LeftPost=1,LogInDate, if(UserID=previous(UserID), peek(LastLeftPost))) as LastLeftPost
RESIDENT Temp
ORDER BY UserID, LogInDate
;
DROP TABLE Temp;
Maybe like this:
Temp:
LOAD * INLINE [
LogInID, UserID, LogInDate, LeftPost, AnsweredPost
1, AA, 2015-01-10, 1, 0
2, AA, 2015-01-12, 1, 0
3, BB, 2015-01-12, 0, 1
4, AA, 2015-01-17, 0, 1
5, BB, 2015-01-22, 1, 1
6, CC, 2015-01-22, 0, 1
];
NOCONCATENATE
Result:
LOAD *, if(AnsweredPost=1 and ( LeftPost=1 or (LeftPost=0 and LogInDate - LastLeftPost <=10)),1,0) as HasAnswerAttachedToPost;
LOAD *, if(LeftPost=1,LogInDate, if(UserID=previous(UserID), peek(LastLeftPost))) as LastLeftPost
RESIDENT Temp
ORDER BY UserID, LogInDate
;
DROP TABLE Temp;
Here you go - Have a look at the following load script
USERDATA:
LOAD * INLINE [
LogInID, UserID, LogInDate, LeftPost, AnsweredPost
1, AA, 2015-01-10, 1, 0
2, AA, 2015-01-12, 1, 0
3, BB, 2015-01-12, 0, 1
4, AA, 2015-01-17, 0, 1
5, BB, 2015-01-22, 1, 1
6, CC, 2015-01-22, 0, 1
];
NoConcatenate
RESULT:
Load
LogInID,
UserID,
LogInDate,
LeftPost,
AnsweredPost,
if(AnsweredPost=1 and LeftPost=0,1) as HasAnsweredAttachedToPost
Resident USERDATA;
Drop Table
USERDATA;
Have Fun!!
Thank you very much Mr. Wassenaar!
After reviewing your code and trying it on my "real" data, i stumbled over two more challanges that i am struggeling with. If you take a look at this (closer to the real) data sample:
LogInID | UserID | LogInDate | LeftPost | AnsweredPost | HasUploadedContent |
---|---|---|---|---|---|
1 | AA | 2015-01-10 | 1 | 0 | 0 |
2 | AA | 2015-01-12 | 1 | 0 | 0 |
3 | BB | 2015-01-12 | 1 | 0 | 0 |
4 | AA | 2015-01-17 | 0 | 1 | 1 |
5 | AA | 2015-01-18 | 0 | 1 | 1 |
6 | BB | 2015-01-12 | 1 | 0 | 0 |
7 | BB | 2015-01-15 | 0 | 1 | 1 |
What i would like to achieve is:
For every LogIn of a user for one day where the user has left a post, i would like to determine if this user has also answered a post and / or uploaded content either on the day of the post or within 10 days after the post. (if these login records have a LeftPost flag = 0 - in these cases i consider the original post(s) to have an answer attached and / or an upload attached.
Following the example above the my result data set would look like this:
LogInID | UserID | LogInDate | LeftPost | AnsweredPost | HasAnswerAttachedToPost | HasUploadedContent | HasContentAttachedToPost |
---|---|---|---|---|---|---|---|
1 | AA | 2015-01-10 | 1 | 0 | 1 | 0 | 0 |
2 | AA | 2015-01-10 | 1 | 0 | 1 | 0 | 1 |
3 | BB | 2015-01-12 | 1 | 0 | 1 | 0 | 0 |
4 | AA | 2015-01-17 | 0 | 1 | Null() | 0 | Null() |
5 | AA | 2015-01-21 | 0 | 1 | Null() | 1 | Null() |
6 | BB | 2015-01-12 | 1 | 0 | 1 | 1 | 1 |
7 | BB | 2015-01-15 | 0 | 1 | Null() | 1 | Null() |
I guess, the difficulties i have in particular are:
- Set the two new fields HasAnswerAttachedToPost and HasContentAttachedToPost properly in case a user has two or more LogInRecords with LeftPost=1, but AnsweredPost=0 and HasUploadedContent=0 on the same day. In that case i need to consider all of these records when checking for following LogIns of this user for potential Answer and / or Upload attachments where LeftPost = 0.
Hope, my problem is clear & thanks a lot in advance for any help, hints or suggestions!