Hello,
I am embarking on a data cleaning project in one of our applications, and one of the fields I want to use in analysis for this dashboard is "Business Contact". The field in our application is open text, and we generally ask people to enter either an internal mailing list (e.g., *MailingList) or their single-sign-on ID (e.g., AB12345). Of course, not everybody follows directions so we will sometimes get just someone's name (e.g., John Smith).
What I would like to do is pull this text into two columns and disect the strings, so today I might have:
ItemID | Contact |
1 | AB12345 |
2 | *MailingListOne |
3 | *MailingListTwo, BC12345 |
4 | CD12345, DE12345 or *MailingListThree |
5 | John Smith |
and I want to turn this into something like this:
ItemID | Contact | ContactType |
1 | AB12345 | Single-Sign-On |
2 | *MailingListOne | Mailing List |
3 | *MailingListTwo | Mailing List |
3 | BC12345 | Single-Sign-On |
4 | CD12345 | Single-Sign-On |
4 | DE12345 | Single-Sign-On |
4 | *MailingListThree | Mailing List |
5 | John Smith | Other |
I'm facing two challenges...
First, there is no common delimiter. There might be a comma, a semi-colon, etc. Can I simply pull out any string of characters that starts with an asterisk or contains two letters followed by five numbers?
Second, once I isolate these via substring, how can I shape logic of the third column that recognizes two letters and 5 numbers as "Single Sign On"?
Thanks for any help or suggestions you can provide!