Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi,
I have a table and I would like to include only user ids that reached Stage 3(users can go into Stage 1, then Stage 2 and so on, but it's possible a user stages stops at Stage 2. however, I would like to make sure I have only users who went to Stage 1, Stage 2 and Stage 3). One user can't get to Stage 3 without going into Stage 1 and then Stage 2.
Hi,
Looks like you have spaces in the stage field, so first, you should prepare your data, for instance, remove text and spaces.
Try the following:
//Create a new field that contains only the numeric part of the stage field
Data:
LOAD
*,
KeepChar(stage, '0123456789 ') AS StageNum;
Load * inline [
UserID,UserName,date,stage
1, abc, 2022-01-01, Stage1
1, abc, 2022-02-01, Stage2
1, abc, 2022-05-01, Stage3
1, abc, 2022-05-01, Stage4
2, hgb, 2022-01-01, Stage1
2, hgb, 2022-02-01, Stage2
2, hgb, 2022-05-01, Stage3
2, hgb, 2022-05-01, Stage 4
3, hgh, 2022-01-01, Stage1
3, hgh, 2022-02-01, Stage2
3, hgh, 2022-05-01, Stage3
4, gdf, 2022-05-01, Stage 1
];
//Find max StageNum for each user
//Join UserMaxStage to Data table
//UserMaxStage can be used as a filter in set analysis
Left Join(Data)
MaxStage:
LOAD
UserID,
Max(StageNum) AS UserMaxStage
Resident Data
Group By UserID;
//Drop StageNum field from the Data Model
DROP FIELD StageNum;
Result:
with selected value:
Regards,
Vitalii
Is it related to the script or the UI?
- Marcus
hi, not sure if it can be done in the UI, I guess only in the script.
I want to select users that went to all 3 stages and make sure only those users will be visible in that Table Box.
Thank you
Both ways should be possible - in script:
temp: load distinct UserID from X where stage = 'Stage3';
final: load *, exists(UserID) as StageFlag from X;
and the flag could be then used as selection or set analysis condition.
Within the UI you may also apply something like this:
count({< UserID = p({< stage = {'Stage3'}>} UserID)>} UserName)
- Marcus
I already have this in the script.
Load
user_id as [User ID],
user_name as [User Name]
stage as [Stage],
date as [Date];
SQL
SELECT
user_id,
username,
stage,
date
from table_name; //data extracting from Redshift
Should this go before my script?
By using exists() the load-order within the script is important because the check happens against all already loaded field-values. This means you need to control the load-order whereby they mustn't be mandatory changed because you could also use a (multiples) renaming of the field and/or using exists() with two parameters. In regard to my example from above it may look like:
temp: load distinct UserID as UserIDForExists from X where stage = 'Stage3';
final: load *, exists(UserIDForExists, UserID) as StageFlag from X;
- Marcus
Hi,
You can add this after your load to get a flag
Left Join(YOUR table)
Flag:
LOAD Distinct
UserID,
'1' AS StageFlag
Resident YOUR table
Where Stage= 'Stage3'
;
Regards,
Vitalii
This can work, but it will also include users who reached Stage 4 - but I only want users who reached Stage 3 and stopped there.
Hi,
In this case try this, first you need to find max stage and then use it as a Filter
Left Join([YOUR table])
Flag:
LOAD
UserID,
1 as StageFlag
Where MaxStage = 'Stage3';
LOAD
UserID,
Max(Stage) AS MaxStage
Resident [YOUR table]
Group By UserID
;
Regards,
Vitalii
This is actually working, can this be achieved to create another pivot table where I can include only users that have reached Stage 4? (mentioning that a user can go above Stage 4 like Stage 5, Stage 6)
If I do another join where max(stage)=stage 4 is not working