Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
I have below dataset. Basially one field with text values in specific format
load * inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc ];
Output:
Text | Flag | Value |
abc-123;abc-345;abc-xyz | Matched | abc |
abc-123;xyz-345 | Not Matched | xyz |
abc-123;xyz-345;abc-xyz;pqr-abc | Not Matched | pqr |
Basically, text values are separated by semicolon(;). So for each part of the semicolon, first value separated by "-" need to be cheked, and if values are same for all of the part then dispay that. If not same then pick the value from last part of the "-"
For eg. abc-123;abc-345;abc-xyz, here highlighted values in each part are same, hence display "abc" and also dislay flag as "Matched"
abc-123;xyz-345, here highlighted values are different in both the parts, hence display "xyx" from last part and display flag as "Not Matched"
Hope I have explained it properly.
@sunny_talwar @tresesco @Kushal_Chawda @Vegar @marcus_sommer @MayilVahanan @Taoufiq_Zarra @stevejoyce
Hi
Try like below
tab1:
Load *, if(NoOfText = NoOfResult, 'Matched', 'Not Matched') as Flag;
load *,SubStringCount(Text, ';')+1 as NoOfText, SubStringCount(Text, Left(SubField(Text,';',-1), Index(Text,'-')-1)) as NoOfResult, Left(SubField(Text,';',-1), Index(Text,'-')-1) as Result inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc
];
@Santosh_K another way
tab1:
LOAD *,
SubField(SubField(Text,';',-1),'-',1) as Value,
if(SubStringCount(Text,subfield(Text,'-',1)&'-')=SubStringCount(Text,';')+1,'Matched','Not Matched') as Flag
inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc-xyz;pqr-abc
];
Try this,
tab1:
load * inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc
];
Left Join(tab1)
LOAD *, SubField(Text,';') As T1
Resident tab1
;
Left Join(tab1)
LOAD *, SubField(T1,'-',1) As T2, RowNo() As N1
Resident tab1
;
Left Join(tab1)
LOAD Text, FirstSortedValue(T2, -N1) As Value
Resident tab1
Group By Text
;
Drop Fields T1, T2, N1;
Output:
So, basically if all the values are same or not, still pulling the last value will get the Result.
Hi @Santosh_K
Try like below
tab1:
load *, Left(SubField(Text,';',-1), Index(Text,'-')-1) as Result inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc
];
o/p:
@MayilVahanan @Saravanan_Desingh thanks for your reply, but I always need to check values are equal or not in all parts, because I need to create a flag which says, matched or not. I have updated required output in my OP
Hi
Try like below
tab1:
Load *, if(NoOfText = NoOfResult, 'Matched', 'Not Matched') as Flag;
load *,SubStringCount(Text, ';')+1 as NoOfText, SubStringCount(Text, Left(SubField(Text,';',-1), Index(Text,'-')-1)) as NoOfResult, Left(SubField(Text,';',-1), Index(Text,'-')-1) as Result inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc
];
@Santosh_K another way
tab1:
LOAD *,
SubField(SubField(Text,';',-1),'-',1) as Value,
if(SubStringCount(Text,subfield(Text,'-',1)&'-')=SubStringCount(Text,';')+1,'Matched','Not Matched') as Flag
inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc-xyz;pqr-abc
];
Try this,
tab1:
load * inline [
Text
abc-123;abc-345;abc-xyz
abc-123;xyz-345
abc-123;xyz-345;abc:xyz;pqr-abc
];
Left Join(tab1)
LOAD *, SubField(Text,';') As T1
Resident tab1
;
Left Join(tab1)
LOAD *, SubField(T1,'-',1) As T2, RowNo() As N1
Resident tab1
;
Left Join(tab1)
LOAD Text, If(Concat(DISTINCT T2)=FirstSortedValue(T2, -N1),Concat(DISTINCT T2),FirstSortedValue(T2, -N1)) As Value
Resident tab1
Group By Text
;
Drop Fields T1, T2, N1;
@Santosh_K assuming that your text values always separated by ";' & always separated by "-" between parts then refer to my previous reply