Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I've been looking at the various text manipulation options in script but cannot seem to find a way forward so need the communities help.
I have a list of data that comes through with headers as follows:
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
The unique fields above cover only 3 industry codes, AA10, AA11.1 and AA23.2
If these were of a fixed length I could just use a left( ) function but the size varies.
Additionally, I also want to pull out the measure portion which in this example is the P, C and SPF values shown at the end of each string.
How can I do this in the script?
If it helps, there are always 2 letters to start off for each industry (ie there will always be 2 letters like AA followed by a set of values that are just numbers (eg AA10), have decimal points (eg AA11.1) or have a another decimal point and/or separator (eg AA23.2/45)
Please help
sounds like this could be a solution:
table1:
LOAD *,
Left(Header,FindOneOf(Header,PurgeChar(Header,'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),-1)) as IndustryCode,
Mid(Header,FindOneOf(Header,PurgeChar(Header,'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),-1)+1) as Measure
Inline [
Header
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
AB1A
AB1\2B
AB1.1C
AB1.1|23D
AB12E
AB12.3FG
AB123.45H
AB123.456/789IJ
BB1
BB23
BB34.5
BB456.7A
BB5678.90/0.12BC
];
hope this helps
regards
Marco
try like this:
Load *, if(WildMatch(Test,'*.*')=1,subfield(Test,'.',1)&'.'&left(SubField(Test,'.',2),1),left(Test,4)) as New,
Purgechar(PurgeChar(Test,if(WildMatch(Test,'*.*')=1,subfield(Test,'.',1)&'.'&left(SubField(Test,'.',2),1),left(Test,4))),'/') as New1;
LOAD * INLINE [
Test
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
];
hi try this,
temp:
LOAD header,
IF(WildMatch(header,'AA10*')=1,'AA10',if(WildMatch(header,'AA11.1*')=1,'AA11.1',if(WildMatch(header,'AA23.2*')=1,'AA23.2',''))) as industrycode
FROM [data.xlsx] (ooxml, embedded labels, table is Sheet4);
left join(temp)
load
header,
PurgeChar(header,industrycode)as separator
Resident temp;
May be try this?
LOAD *,
Left(TextVal, Index(TextVal, KeepChar(PurgeChar(TextVal,'AA'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 1)-1) AS NewFieldValue INLINE [
TextVal
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
];
Hi,
another solution might be:
table1:
LOAD *,
Left(Header,FindOneOf(Mid(Header,3)&'/',PurgeChar(Mid(Header,3),'0123456789.')&'/')+1) as IndustryCode,
Mid(Header,FindOneOf(Header,PurgeChar(Header,'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),-1)+1) as Measure
Inline [
Header
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
AB1A
AB1\B
AB1.1C
AB1.1|D
AB12E
AB12.3FG
AB123.45H
AB123.456/IJ
BB1
BB23
BB34.5
BB456.7A
BB5678.90/BC
];
hope this helps
regards
Marco
Thanks everyone! These are great.. Marco's gets closest but there is just one more thing.
Marco, one of the sectors AA23.2/45 needs to be shown as that sector... your code makes it AA23.2 only.
How do I include that last part.. once I've got that I can mark your answer as correct!
Thank you again
Did you try my expr?
LOAD *,
Left(TextVal, Index(TextVal, KeepChar(PurgeChar(TextVal,'AA'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 1)-1) AS NewFieldValue,
KeepChar(PurgeChar(TextVal,'AA'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') AS MeasureCode INLINE [
TextVal
AA10P
AA10C
AA10SPF
AA11.1P
AA11.1C
AA11.1SPF
AA23.2/45P
AA23.2/45C
AA23.2/45SPF
];
Sorry ignore my expr works only if the values start with AA. Probably you have different start characters right?
Isn't that contradicting your initial statement that "The unique fields above cover only 3 industry codes, AA10, AA11.1 and AA23.2"?
HI Marco - sorry if I was not clear and I appreciate your help as I did not even get 50% of the answer and you guys have done a lot.
The use cases are not only for those numbers but for any type of 2 letter initial region code, a sector number and a measure designation (SPF, C , M ... any combo really).
The examples were only 3 and the output was to be something like AA10, AA11.1 or AA23.2/45... OR it could be UN35.23 or XY55.3/5
each of the above i would call a sector code... any letters that come after are what I call measures
Thanks