Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
johnca
Specialist
Specialist

TextBetween or Subfield?

I'm trying to extract specific data from rows that are not always in the same position. The data I'm interested in is divided into 3 parts and separated by periods ".". Some data only contains 2 parts. Each data field is preceeded and followed by a space, such as;

ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default     (data is in 4th position)

DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default     (data is in 3rd position)

X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default     (data is in 5th position)

ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye     (data is in 5th position and 3rd field is blank)

There's actually 5 or 6 different positions the required data would be in.

And each data would be separated into 3 new fields, Field1, Field2, and Field3.

So, my results would be;

Field1Field2Field3
HelpMeFindThis
HelpMeGetThis
GetThisData
GetThis

I've been trying nested IF's using TextBetween and have gotten good results, but still can't get 100% of the data extracted, and if some new "IF" came along I'd have to redo the nested IF's. I was hoping someone else has run across this and has a better mouse trap. Perhaps using SubField?

Message was edited by: johncaqc Fixed data in table

10 Replies
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Try this out. QVW sample attached as well.

-Rob

http://robwunderlich.com

LOAD

          //Phrase,

          SubField(Phrase,'.',1) as Field1

          ,SubField(Phrase,'.',2) as Field2

          ,SubField(Phrase,'.',3) as Field3

;

LOAD

          Phrase

Where Phrase LIKE '*.*.*'

;

LOAD

          subfield(RawData, ',') as Phrase

;

LOAD * INLINE [

RawData

ABC , 123-All- , Fixed , Help.Me.FindThis , Or Else , Default , Default     (data is in 4th position)

DEF , Fixed , Help.Me.GetThis , Or Else , Default , Default     (data is in 3rd position)

X , 123-All- , Broken , Beautiful , Get.This.Data , Or Else , Default , Default     (data is in 5th position)

ABC , 123-All- , Broken , Careful , Get.This. , Easy as pie , Hard as nails , Bye     (data is in 5th position and 3rd field is blank)

]

(delimiter is ';')

;

johnca
Specialist
Specialist
Author

When I try SubField() a la;

SubField(Concatenated_Segments,'.',1) as Field1,

SubField(Concatenated_Segments,'.',2) as Field2,

SubField(Concatenated_Segments,'.',3) as Field3,

it gets me close too, but it returns too many segments/characters for the Field1 and Field3 positions. Plus, I found I have data where some preceeding segments have periods in them and it returns segments too early. An example of one row where this is the case is;

ABC , 1.0.0.NA , Come On Man , Help.Me.FindThis , Or Else , Default , Default

johnca
Specialist
Specialist
Author

Thanks Rob. I'm playing with this now...will need some tweaking as the phrase format is more like '*,*,*.*.*.*,' where the data I'm looking for is, I believe, at least 2 commas into the field. I don't think it's less than 3 commas in...but this looks a lot more easy to maintain than my multiline IF's. 🙂

However, the last line where you specify the delimiter as ';' confuses me. There's no semi-colons in the data.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi John,

I think you are misunderstanding. It doesn't matter how many commas in the phrase is. It will still be located. The preceeding load does things one piece at a time.

After the

LOAD

          subfield(RawData, ',') as Phrase

          ,RawData

;

Phrase Contains:

ABC

123-All-

Fixed

Help.Me.FindThis

Or Else

Default

Default     (data is in 4th position)

Then

LOAD

          Phrase

Where Phrase LIKE '*.*.*'

;

filters the Phrase list down to

Help.Me.FindThis

and lastly

LOAD

          Phrase,

          SubField(Phrase,'.',1) as Field1

          ,SubField(Phrase,'.',2) as Field2

          ,SubField(Phrase,'.',3) as Field3

;

breaks that Phrase into 3 fields.

So it doesn't matter how far in the *.*.* phrase is.

-Rob

phaneendra_kunc
Partner - Specialist III
Partner - Specialist III

You want this  "1.0.0.NA" to be present in final data set or you dont want?

johnca
Specialist
Specialist
Author

Hi Rob, you're right in most cases, and it does seem to work for the initial data I presented. However, in the examples below (of almost real data) I need to return the bold entries but get 1, 0 and 0 or 1, 1 and 0 for the 3rd and 4th rows. That's why I was thinking of the alternative phrase formatting.

Fieldname = CONCATENATED_SEGMENTS

800-All- , YES , COMMON.CC.HUB , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc

800-All- , NO , QWBS.RAW.ACU , RAW , DEFAULT.DEFAULT.DEFAULT. , Misc

755-All- , 1.0.0.NA , NO , COMMON.RAW.FIXED , OTHR , Powderpuff, Celia , 82 , RAW

QED , 1.1.0.NA , NO , COMMON.WIP. , RAW , MRO

105-All- , NO , COMMON.RAW. , Christ, Jesus , J9 , RAW , DEFAULT.DEFAULT.DEFAULT. , MRO

EQM , 100-All- , NO , CORP.WIP.ODU , Powderpuff, Celia , LV , RAW

Regards,

John

johnca
Specialist
Specialist
Author

Hi Phaneendra,

No, I need;

Help

Me

FindThis

Please see the previous response to Rob with better examples.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

The default delimiter in an INLINE load is comma. Since you had commas in your data, I had to override to something that wasn't present in your data.

To start parsing after the second comma, just add this as a load before the LOAD INLINE.

LOAD

          mid(RawData, index(RawData, ',', 2)) as RawData

;

-Rob

phaneendra_kunc
Partner - Specialist III
Partner - Specialist III

May be you can choose below filter options to get rid of the data you dont need.

In rob's example QVW there is a second LOAD statement where you have to keep these filters

1.) allow the string that has less than or equal to two dots "."

SubStringCount(Phrase,'.') <=2

LOAD

    Phrase

Where Phrase LIKE '*.*.*'    and  SubStringCount(Phrase,'.') <=2 ;

2.) Dont allow any string that has numbers.

len(KeepChar(Phrase,'0123456789')) = 0

LOAD

    Phrase

Where Phrase LIKE '*.*.*'    and 

len(KeepChar(Phrase,'0123456789')) = 0 ;