Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
philipp_meissne
Contributor II
Contributor II

Extract strings within a longer string

Dear Community,

I searched the forums but unfortunately I could not find a solution to my problem. I try to describe it as good as possible:

I have a list of M&A Transactions, including the industry classifiactions of the targets, containing different levels of the different industries the target is active in. Entries might look like this:

Industry Classifications [Target/Issuer]

Application Software; Information Technology; Software; Software and Services; Information Technology (IT) Consulting; IT Consulting and Other Services; IT Services 

Commercial and Professional Services; Industrials; Professional Services; Research and Consulting Services

 

For mapping, I have a complete list of the different industries on different levels (extract):

Industry - Level 1Industry - Level 2Industry - Level 3Industry - Level 4
Information TechnologySemiconductors and Semiconductor EquipmentSemiconductors and Semiconductor EquipmentSemiconductor Equipment
Information TechnologySemiconductors and Semiconductor EquipmentSemiconductors and Semiconductor EquipmentSemiconductors
Information TechnologySoftware and ServicesIT ServicesInternet Services and Infrastructure
Information TechnologySoftware and ServicesSoftwareApplication Software
Information TechnologySoftware and ServicesSoftwareSystems Software

 

What I'm trying to achieve is that if I select a level of Industry (e.g. Level 1: 'Information Technology' or Level 4: 'Application Software') and as a return I see the transactions containing the industry classification somewhere in their string for industry classification.

Let's put it another way, I want to create a scatter plot (x: Transaction Value (Sum), y: Transactions (Count)) for each level of Industries, such that the bubble 'Information Technology' contains all transactions where the string 'Information Technology'  can be found in the field 'Industry Classifation.

 

I hope I was able to describe my Problem 🙂 I would really appreciate some suggestions/help on this one! Thanks so much in advance.

 

Best

Labels (2)
2 Solutions

Accepted Solutions
Vegar
MVP
MVP

 

Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row. 

It looks like you did  the %IndustryClassifications twice, it will cause  your script to faile due to duplicate field names. 

Try the script below. 

Classification:
LOAD Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classifications [Target/Issuer]", ';') as Classification FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);

 

Transaction:
LOAD "CIQ Transaction ID", // Unique Transaction ID, Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);

 

 

View solution in original post

marcus_sommer

You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:

AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,                subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification

or using a preceeding load like:

load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...

 - Marcus

 

View solution in original post

8 Replies
Vegar
MVP
MVP

I assume that Industry classifications is a field in a table where you do got an [M&A Transaction ID]. Try this.

 

Transactions:
LOAD [M&A Transaction ID], Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications, Field1, Field3, etc FROM [YourSource];
Classifications: LOAD Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classificatios [Target/Issuer]", ';') as Classification FROM [YourSource]

The Classifcations table you can manipulate with a common  classification techniques such ass ApplyMap() or JOIN as you only have one value per row. (But many Classifcation rows per Transactions row)

 

philipp_meissne
Contributor II
Contributor II
Author

Hi Vegar! Thanks a lot for the quick reply and the the suggestion.

 

Unfortunately, I'mnot able to follow you completely.... 

Do you mind explaining to me, what the Autonumberhash (I guess you ment to type 256 instead of 265?) function exactly does here? 

Also I'm not sure how to implement your suggestion. Now it looks like this:

LOAD
....
"CIQ Transaction ID",  // Unique Transaction ID,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification,
...
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

However, the script is not running. Am I missing something or did I completely lose the point here?

marcus_sommer

The autonumberhash256() is used to reduce the resources for storing the KEY between your fact-table and an extra dimension-table (with n records per KEY which subfield() as an internal loop creates). This means the essential suggestion is creating two tables and associate them because it couldn't be done within a single table.

- Marcus

Vegar
MVP
MVP

 

Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row. 

It looks like you did  the %IndustryClassifications twice, it will cause  your script to faile due to duplicate field names. 

Try the script below. 

Classification:
LOAD Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classifications [Target/Issuer]", ';') as Classification FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);

 

Transaction:
LOAD "CIQ Transaction ID", // Unique Transaction ID, Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);

 

 

philipp_meissne
Contributor II
Contributor II
Author

Alright, thanks for the explanation. I got the script running, which brought me one step closer to my solution 🙂 Now every classification is extracted and I can draw Sums and Counts for it, so the Scatter is possible! Many thanks for that!

 

However, I have one small issue left: Some Classifications are labeled with a '(Primary)' at the end. I do not care about it and like to drop it. Such that 'Software' and 'Software (Primary)' are counted as 'Software' only. I tried 

	AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
	subfield("Industry Classifications [Target/Issuer]", ';') as Classification,
    Replace(Classification,' (Primary)','')

but the script won't run, giving me the error message 'Field 'Classification' not found'.

 

Any suggestions for a clueless guy? 🙂

marcus_sommer

You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:

AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,                subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification

or using a preceeding load like:

load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...

 - Marcus

 

philipp_meissne
Contributor II
Contributor II
Author

Yes, worked just great! Many thanks for the support guys 🙂

Vegar
MVP
MVP

Glad it worked out for you.