Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a field that pulls in [Notes] data, which can be anything from a sentence, a phone number, or other general notes. In a certain scenario, the data I am looking for is in a specific format: a letter, followed by 6 digits (E123456).
I am trying to separate this data to work with independently, and I am wondering how I could create a function in my load script to look only at this data?
As I said, this data happens only in a specific scenario. Something like [Role]={'ETC','LS'}. If I leave the data as-is, I am able to build my chart with set analysis, but I get separate entries for upper and lower case versions, as well as some repeats that I haven't yet explained. I figured if I pull this data out independently I can use UPPER() on it to eliminate that inconsistency, and I'll otherwise have the flexibility to distil it down as needed. Ultimately, I want to be able to SUM() a [Cost] value across each ID.
I suspect I could use TRIM() to eliminate leading and trailing spaces, then use SUBSRINGCOUNT() with an IF() statement to look for entries that don't have any spaces remaining. That should leave me with single word entries only, of which my data will be a part of. But that won't quite narrow me down to 7 characters, of which only the first is a letter.
If I'm on the right track- or maybe completely off base- I would appreciate some suggestions to accomplish this.
A follow up. I have edited this comment from one where I identified the failure to one that works.
I was able to get a working solution by combining the information discussed here with an inline list of a dozen or so [Support] entries. That further eliminated the possibility of unwanted data. I also added an 'error' response to the IF() statement in the initial load, so on the off chance there is unwanted data that makes it through the SUBSTRINGCOUNT and the WHERE clause in the second half, I can see it right away and make a correction.
While not future-proof, it is future resistant. This solution is enough for my needs.
My version of the initial [Token] load:
Temp:
LOAD *,
IF(SUBSTRINGCOUNT(Token,' ')=0,Token,'Error') as [LS ID Temp];
LOAD [Booking ID],
Support,
UPPER(TRIM([Notes])) as Token
RESIDENT Table
WHERE EXISTS ([List I created], Support);
Followed by your WHERE clause- only using a RESIDENT load rather than a preceding load:
Temp2:
LOAD [Booking ID],
Support,
[LS ID Temp] as [LS ID]
RESIDENT LiveStreamingIDTemp
WHERE LEN([LS ID Temp])=7 AND
NOT(ISNULL(KEEPCHAR(LEFT([LS ID Temp],1), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))) AND
ISNUM(RIGHT([LS ID Temp],6));
Appending the execution of my own suggestion, which is actually working but isn't future-proof. Right now, only the data I need comes up within the set analysis listed above, but if someone were to enter something else in this [Notes] field at a future date, it would come through.
LOAD *,
IF(SUBSTRINGCOUNT([LS ID Temp],' ')=0,[LS ID Temp]) as [LS ID];
LOAD [Booking ID],
UPPER(TRIM([Notes])) as [LS ID Temp]
RESIDENT Table;
And a chart with [LS ID] as the dimension and an expression:
SUM({$<Role={'Live Streaming','ETC'}>}[Calculated Total])
So ultimately, I am looking to ensure only my "letter + 6 digits" format data ends up in [LS ID], so that if someone enters something like "2P-4P" (a time range, which is a common entry in the [Notes] field that I want to eliminate) in the [Notes] field associated with the [Role] I am working with, it will not start appearing on my chart.
Do you want to extract tokens (delimited by spaces?) from a text steam and then search for those that are a letter followed by 6 digits? I have no idea what you mean about scenarios and [Role] and the rest of that paragraph and how that fits into the rest of the post.
To split a text field into tokens, use SubField without the last parameter, Then you can filter the tokens that match your requirements. Something like
LOAD ID,
Token,
Left(Token, 1) as Prefix,
Num(Right(Token, 6) as Number
Where
Len(Token) = 7 And
KeepChar(Left(Token, 1), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') = 1 And
IsNum(Right(Token, 6)
;
LOAD
ID,
Upper(SubField(Notes, ' ')) as Token
...
Unfortunately, this doesn't work. The [Token] field ends up with no values. I wonder if it is because anything that passes the WHERE clause in the preceding load would not be delimited by a space. Spaces would only occur in data that is outside of my scope.
How should UPPER(SUBFIELD([Notes],' ')) as Token respond if there are no spaces? I expect it is giving a null() result in those cases.
I could add IF(ISNULL(UPPER(SUBFIELD([Notes],' '))),[Notes]) as Token to deal with this, but I wonder if there is a cleaner way?
Edit: that didn't work
A follow up. I have edited this comment from one where I identified the failure to one that works.
I was able to get a working solution by combining the information discussed here with an inline list of a dozen or so [Support] entries. That further eliminated the possibility of unwanted data. I also added an 'error' response to the IF() statement in the initial load, so on the off chance there is unwanted data that makes it through the SUBSTRINGCOUNT and the WHERE clause in the second half, I can see it right away and make a correction.
While not future-proof, it is future resistant. This solution is enough for my needs.
My version of the initial [Token] load:
Temp:
LOAD *,
IF(SUBSTRINGCOUNT(Token,' ')=0,Token,'Error') as [LS ID Temp];
LOAD [Booking ID],
Support,
UPPER(TRIM([Notes])) as Token
RESIDENT Table
WHERE EXISTS ([List I created], Support);
Followed by your WHERE clause- only using a RESIDENT load rather than a preceding load:
Temp2:
LOAD [Booking ID],
Support,
[LS ID Temp] as [LS ID]
RESIDENT LiveStreamingIDTemp
WHERE LEN([LS ID Temp])=7 AND
NOT(ISNULL(KEEPCHAR(LEFT([LS ID Temp],1), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))) AND
ISNUM(RIGHT([LS ID Temp],6));