I am embarking on a data cleaning project in one of our applications, and one of the fields I want to use in analysis for this dashboard is "Business Contact". The field in our application is open text, and we generally ask people to enter either an internal mailing list (e.g., *MailingList) or their single-sign-on ID (e.g., AB12345). Of course, not everybody follows directions so we will sometimes get just someone's name (e.g., John Smith).
What I would like to do is pull this text into two columns and disect the strings, so today I might have:
ItemID
Contact
1
AB12345
2
*MailingListOne
3
*MailingListTwo, BC12345
4
CD12345, DE12345 or *MailingListThree
5
John Smith
and I want to turn this into something like this:
ItemID
Contact
ContactType
1
AB12345
Single-Sign-On
2
*MailingListOne
Mailing List
3
*MailingListTwo
Mailing List
3
BC12345
Single-Sign-On
4
CD12345
Single-Sign-On
4
DE12345
Single-Sign-On
4
*MailingListThree
Mailing List
5
John Smith
Other
I'm facing two challenges...
First, there is no common delimiter. There might be a comma, a semi-colon, etc. Can I simply pull out any string of characters that starts with an asterisk or contains two letters followed by five numbers?
Second, once I isolate these via substring, how can I shape logic of the third column that recognizes two letters and 5 numbers as "Single Sign On"?
Thanks for any help or suggestions you can provide!