TextBetween or Subfield? - Qlik Community

johnca · ‎2013-02-21

I'm trying to extract specific data from rows that are not always in the same position. The data I'm interested in is divided into 3 parts and separated by periods ".". Some data only contains 2 parts. Each data field is preceeded and followed by a space, such as;

ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default (data is in 4th position)

DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default (data is in 3rd position)

X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default (data is in 5th position)

ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye (data is in 5th position and 3rd field is blank)

There's actually 5 or 6 different positions the required data would be in.

And each data would be separated into 3 new fields, Field1, Field2, and Field3.

So, my results would be;

Field1	Field2	Field3
Help	Me	FindThis
Help	Me	GetThis
Get	This	Data
Get	This

I've been trying nested IF's using TextBetween and have gotten good results, but still can't get 100% of the data extracted, and if some new "IF" came along I'd have to redo the nested IF's. I was hoping someone else has run across this and has a better mouse trap. Perhaps using SubField?

Message was edited by: johncaqc Fixed data in table

rwunderlich · ‎2013-02-21

Try this out. QVW sample attached as well.

-Rob

http://robwunderlich.com

LOAD

//Phrase,

SubField(Phrase,'.',1) as Field1

,SubField(Phrase,'.',2) as Field2

,SubField(Phrase,'.',3) as Field3

;

LOAD

Phrase

Where Phrase LIKE '*.*.*'

;

LOAD

subfield(RawData, ',') as Phrase

;

LOAD * INLINE [

RawData

ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default (data is in 4th position)

DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default (data is in 3rd position)

X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default (data is in 5th position)

ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye (data is in 5th position and 3rd field is blank)

]

(delimiter is ';')

;

johnca · ‎2013-02-21

When I try SubField() a la;

SubField(Concatenated_Segments,'.',1) as Field1,

SubField(Concatenated_Segments,'.',2) as Field2,

SubField(Concatenated_Segments,'.',3) as Field3,

it gets me close too, but it returns too many segments/characters for the Field1 and Field3 positions. Plus, I found I have data where some preceeding segments have periods in them and it returns segments too early. An example of one row where this is the case is;

ABC , 1.0.0.NA , Come On Man , Help.Me.FindThis , Or Else , Default , Default

johnca · ‎2013-02-21

Thanks Rob. I'm playing with this now...will need some tweaking as the phrase format is more like '*,*,*.*.*.*,' where the data I'm looking for is, I believe, at least 2 commas into the field. I don't think it's less than 3 commas in...but this looks a lot more easy to maintain than my multiline IF's. 🙂

However, the last line where you specify the delimiter as ';' confuses me. There's no semi-colons in the data.

rwunderlich · ‎2013-02-21

Hi John,

I think you are misunderstanding. It doesn't matter how many commas in the phrase is. It will still be located. The preceeding load does things one piece at a time.

After the

LOAD

subfield(RawData, ',') as Phrase

,RawData

;

Phrase Contains:

ABC

123-All-

Fixed

Help.Me.FindThis

Or Else

Default

Default (data is in 4th position)

Then

LOAD

Phrase

Where Phrase LIKE '*.*.*'

;

filters the Phrase list down to

Help.Me.FindThis

and lastly

LOAD

Phrase,

SubField(Phrase,'.',1) as Field1

,SubField(Phrase,'.',2) as Field2

,SubField(Phrase,'.',3) as Field3

;

breaks that Phrase into 3 fields.

So it doesn't matter how far in the *.*.* phrase is.

-Rob

phaneendra_kunc · ‎2013-02-21

You want this "1.0.0.NA" to be present in final data set or you dont want?

johnca · ‎2013-02-21

Hi Rob, you're right in most cases, and it does seem to work for the initial data I presented. However, in the examples below (of almost real data) I need to return the bold entries but get 1, 0 and 0 or 1, 1 and 0 for the 3rd and 4th rows. That's why I was thinking of the alternative phrase formatting.

Fieldname = CONCATENATED_SEGMENTS

800-All- , YES , COMMON.CC.HUB , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc

800-All- , NO , QWBS.RAW.ACU , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc

755-All- , 1.0.0.NA , NO , COMMON.RAW.FIXED , OTHR , Powderpuff, Celia , 82 , RAW

QED , 1.1.0.NA , NO , COMMON.WIP. , RAW , MRO

105-All- , NO , COMMON.RAW. , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , MRO

EQM , 100-All- , NO , CORP.WIP.ODU , Powderpuff, Celia , LV , RAW

Regards,

John

johnca · ‎2013-02-21

Hi Phaneendra,

No, I need;

Help

Me

FindThis

Please see the previous response to Rob with better examples.

rwunderlich · ‎2013-02-21

The default delimiter in an INLINE load is comma. Since you had commas in your data, I had to override to something that wasn't present in your data.

To start parsing after the second comma, just add this as a load before the LOAD INLINE.

LOAD

mid(RawData, index(RawData, ',', 2)) as RawData

;

-Rob

phaneendra_kunc · ‎2013-02-21

May be you can choose below filter options to get rid of the data you dont need.

In rob's example QVW there is a second LOAD statement where you have to keep these filters

1.) allow the string that has less than or equal to two dots "."

SubStringCount(Phrase,'.') <=2

LOAD

Phrase

Where Phrase LIKE '*.*.*' and SubStringCount(Phrase,'.') <=2 ;

2.) Dont allow any string that has numbers.

len(KeepChar(Phrase,'0123456789')) = 0

LOAD

Phrase

Where Phrase LIKE '*.*.*' and

len(KeepChar(Phrase,'0123456789')) = 0 ;