Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I need to build the logic in script for below data scenario
ID text_description search_string
1 abc;N:xyz;abc;N:pqr;K:123;N:123; N
2 B:abc;xyz;abc;B:123;xyz;N:123; B
3 123;C:test;C:234;abc;pqr;B:xyz; C
text_description always starts with some texts and end with ';' and in between texts are also separated by ';'.
I want to search value of search_string column in text_description column and extract the part of string with below logic
Let's take a example of ID =1
text_description is abc;N:xyz;abc;N:pqr;K:123;N:123;
search_string is 'N'
Now I need to search 'N' followed by ':' in text_description and extract the text between 'N:'& ';'
'N:' could be anywhere in the string and could be repeated multiple times. Hence for each occurrence we need to extract the texts
Output for ID=1 will be
ID text_description String
1 abc;N:xyz;abc;N:pqr;K:123;N:123; xyz;pqr
Similar logic should work for other search string.
This is fake data as sample data will be difficult to share but in actual scenario format of the data will be same.
Any help will be really appreciated.
another approach if you don't want to increase your data using interno() and group it later just to have simple load statements
T1:
load ID,
trim(text_description) as text_description,
trim(search_string) as search_string
SubStringCount(trim(text_description),trim(search_string)&':') as count_search_string
FROM Data;
Max:
Load max(count_search_string) as max_count_search_string;
Load FieldValue('count_search_string',RecNo()) as count_search_string
AutoGenerate FieldValueCount('count_search_string');
Let vMaxCountSearchString = Peek('max_count_search_string',0,'Max');
let vTextDescriptionFieldName ='text_description'; // text description field name in your data
let vSearchStringFieldName = 'search_string'; // search string field name in your data
Let vFormula1 = '(TextBetween($(vTextDescriptionFieldName),$(vSearchStringFieldName)&'
&chr(39)&':'&chr(39)&','&chr(39)&';'&chr(39)&')&';
for i=2 to '$(vMaxCountSearchString)'
let vFormula1 = '$(vFormula1)' &chr(39)&';'&chr(39)&'&TextBetween($(vTextDescriptionFieldName),
$(vSearchStringFieldName)&'&chr(39)&':'&chr(39)&','&chr(39)&';'&chr(39)&',$(i))&';
Next
let vFormula2 = mid('$(vFormula1)',1,len('$(vFormula1)')-1)&')';
let vFormula = 'left('&'$(vFormula2)'&','&
'len('&'$(vFormula2)'&')'&' - ('&'$(vMaxCountSearchString)'&'- count_search_string))';
let vFormula1 =Null();
let vFormula2 =Null();
T2:
Load *,
$(vFormula) as String
Resident T1;
Drop Table T1;
With Group By.
tab1:
LOAD *, TextBetween(text_description,search_string&':',';',IterNo()) As Word,
IterNo() As StrIx
While IterNo() <= SubStringCount(text_description,search_string&':');
LOAD * INLINE [
ID, text_description, search_string
1, abc;N:xyz;abc;N:pqr;K:123;N:123;, N
2, B:abc;xyz;abc;B:123;xyz;N:123;, B
3, 123;C:test;C:234;abc;pqr;B:xyz;, C
];
Left Join(tab1)
LOAD ID, Concat(Word,';',StrIx) As String
Resident tab1
Group By ID;
One Solution is.
tab1:
LOAD *, TextBetween(text_description,search_string&':',';',IterNo()) As Word
While IterNo() <= SubStringCount(text_description,search_string&':');
LOAD * INLINE [
ID, text_description, search_string
1, abc;N:xyz;abc;N:pqr;K:123;N:123;, N
2, B:abc;xyz;abc;B:123;xyz;N:123;, B
3, 123;C:test;C:234;abc;pqr;B:xyz;, C
];
Output.
another approach if you don't want to increase your data using interno() and group it later just to have simple load statements
T1:
load ID,
trim(text_description) as text_description,
trim(search_string) as search_string
SubStringCount(trim(text_description),trim(search_string)&':') as count_search_string
FROM Data;
Max:
Load max(count_search_string) as max_count_search_string;
Load FieldValue('count_search_string',RecNo()) as count_search_string
AutoGenerate FieldValueCount('count_search_string');
Let vMaxCountSearchString = Peek('max_count_search_string',0,'Max');
let vTextDescriptionFieldName ='text_description'; // text description field name in your data
let vSearchStringFieldName = 'search_string'; // search string field name in your data
Let vFormula1 = '(TextBetween($(vTextDescriptionFieldName),$(vSearchStringFieldName)&'
&chr(39)&':'&chr(39)&','&chr(39)&';'&chr(39)&')&';
for i=2 to '$(vMaxCountSearchString)'
let vFormula1 = '$(vFormula1)' &chr(39)&';'&chr(39)&'&TextBetween($(vTextDescriptionFieldName),
$(vSearchStringFieldName)&'&chr(39)&':'&chr(39)&','&chr(39)&';'&chr(39)&',$(i))&';
Next
let vFormula2 = mid('$(vFormula1)',1,len('$(vFormula1)')-1)&')';
let vFormula = 'left('&'$(vFormula2)'&','&
'len('&'$(vFormula2)'&')'&' - ('&'$(vMaxCountSearchString)'&'- count_search_string))';
let vFormula1 =Null();
let vFormula2 =Null();
T2:
Load *,
$(vFormula) as String
Resident T1;
Drop Table T1;
If you don't have huge data set then use what @Saravanan_Desingh suggested. It is simple and easy to understand. Just to add on if you want the output as ';' separated then use concat(String,';') either as measure in chart or in script using group by on further load.
With Group By.
tab1:
LOAD *, TextBetween(text_description,search_string&':',';',IterNo()) As Word,
IterNo() As StrIx
While IterNo() <= SubStringCount(text_description,search_string&':');
LOAD * INLINE [
ID, text_description, search_string
1, abc;N:xyz;abc;N:pqr;K:123;N:123;, N
2, B:abc;xyz;abc;B:123;xyz;N:123;, B
3, 123;C:test;C:234;abc;pqr;B:xyz;, C
];
Left Join(tab1)
LOAD ID, Concat(Word,';',StrIx) As String
Resident tab1
Group By ID;
Output.
Thank you. Both work fine as expected