Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm trying to extract specific data from rows that are not always in the same position. The data I'm interested in is divided into 3 parts and separated by periods ".". Some data only contains 2 parts. Each data field is preceeded and followed by a space, such as;
ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default (data is in 4th position)
DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default (data is in 3rd position)
X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default (data is in 5th position)
ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye (data is in 5th position and 3rd field is blank)
There's actually 5 or 6 different positions the required data would be in.
And each data would be separated into 3 new fields, Field1, Field2, and Field3.
So, my results would be;
Field1 | Field2 | Field3 |
---|---|---|
Help | Me | FindThis |
Help | Me | GetThis |
Get | This | Data |
Get | This |
I've been trying nested IF's using TextBetween and have gotten good results, but still can't get 100% of the data extracted, and if some new "IF" came along I'd have to redo the nested IF's. I was hoping someone else has run across this and has a better mouse trap. Perhaps using SubField?
Message was edited by: johncaqc Fixed data in table
Try this out. QVW sample attached as well.
-Rob
LOAD
//Phrase,
SubField(Phrase,'.',1) as Field1
,SubField(Phrase,'.',2) as Field2
,SubField(Phrase,'.',3) as Field3
;
LOAD
Phrase
Where Phrase LIKE '*.*.*'
;
LOAD
subfield(RawData, ',') as Phrase
;
LOAD * INLINE [
RawData
ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default (data is in 4th position)
DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default (data is in 3rd position)
X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default (data is in 5th position)
ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye (data is in 5th position and 3rd field is blank)
]
(delimiter is ';')
;
When I try SubField() a la;
SubField(Concatenated_Segments,'.',1) as Field1,
SubField(Concatenated_Segments,'.',2) as Field2,
SubField(Concatenated_Segments,'.',3) as Field3,
it gets me close too, but it returns too many segments/characters for the Field1 and Field3 positions. Plus, I found I have data where some preceeding segments have periods in them and it returns segments too early. An example of one row where this is the case is;
ABC , 1.0.0.NA , Come On Man , Help.Me.FindThis , Or Else , Default , Default
Thanks Rob. I'm playing with this now...will need some tweaking as the phrase format is more like '*,*,*.*.*.*,' where the data I'm looking for is, I believe, at least 2 commas into the field. I don't think it's less than 3 commas in...but this looks a lot more easy to maintain than my multiline IF's. 🙂
However, the last line where you specify the delimiter as ';' confuses me. There's no semi-colons in the data.
Hi John,
I think you are misunderstanding. It doesn't matter how many commas in the phrase is. It will still be located. The preceeding load does things one piece at a time.
After the
LOAD
subfield(RawData, ',') as Phrase
,RawData
;
Phrase Contains:
ABC
123-All-
Fixed
Help.Me.FindThis
Or Else
Default
Default (data is in 4th position)
Then
LOAD
Phrase
Where Phrase LIKE '*.*.*'
;
filters the Phrase list down to
Help.Me.FindThis
and lastly
LOAD
Phrase,
SubField(Phrase,'.',1) as Field1
,SubField(Phrase,'.',2) as Field2
,SubField(Phrase,'.',3) as Field3
;
breaks that Phrase into 3 fields.
So it doesn't matter how far in the *.*.* phrase is.
-Rob
You want this "1.0.0.NA" to be present in final data set or you dont want?
Hi Rob, you're right in most cases, and it does seem to work for the initial data I presented. However, in the examples below (of almost real data) I need to return the bold entries but get 1, 0 and 0 or 1, 1 and 0 for the 3rd and 4th rows. That's why I was thinking of the alternative phrase formatting.
Fieldname = CONCATENATED_SEGMENTS
800-All- , YES , COMMON.CC.HUB , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc
800-All- , NO , QWBS.RAW.ACU , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc
755-All- , 1.0.0.NA , NO , COMMON.RAW.FIXED , OTHR , Powderpuff, Celia , 82 , RAW
QED , 1.1.0.NA , NO , COMMON.WIP. , RAW , MRO
105-All- , NO , COMMON.RAW. , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , MRO
EQM , 100-All- , NO , CORP.WIP.ODU , Powderpuff, Celia , LV , RAW
Regards,
John
Hi Phaneendra,
No, I need;
Help
Me
FindThis
Please see the previous response to Rob with better examples.
The default delimiter in an INLINE load is comma. Since you had commas in your data, I had to override to something that wasn't present in your data.
To start parsing after the second comma, just add this as a load before the LOAD INLINE.
LOAD
mid(RawData, index(RawData, ',', 2)) as RawData
;
-Rob
May be you can choose below filter options to get rid of the data you dont need.
In rob's example QVW there is a second LOAD statement where you have to keep these filters
1.) allow the string that has less than or equal to two dots "."
SubStringCount(Phrase,'.') <=2
LOAD
Phrase
Where Phrase LIKE '*.*.*' and SubStringCount(Phrase,'.') <=2 ;
2.) Dont allow any string that has numbers.
len(KeepChar(Phrase,'0123456789')) = 0
LOAD
Phrase
Where Phrase LIKE '*.*.*' and
len(KeepChar(Phrase,'0123456789')) = 0 ;