Solved: Re: Extract strings within a longer string - Qlik Community

philipp_meissne · ‎2019-02-04

Dear Community,

I searched the forums but unfortunately I could not find a solution to my problem. I try to describe it as good as possible:

I have a list of M&A Transactions, including the industry classifiactions of the targets, containing different levels of the different industries the target is active in. Entries might look like this:

Industry Classifications [Target/Issuer]

Application Software; Information Technology; Software; Software and Services; Information Technology (IT) Consulting; IT Consulting and Other Services; IT Services

Commercial and Professional Services; Industrials; Professional Services; Research and Consulting Services

For mapping, I have a complete list of the different industries on different levels (extract):

Industry - Level 1	Industry - Level 2	Industry - Level 3	Industry - Level 4
Information Technology	Semiconductors and Semiconductor Equipment	Semiconductors and Semiconductor Equipment	Semiconductor Equipment
Information Technology	Semiconductors and Semiconductor Equipment	Semiconductors and Semiconductor Equipment	Semiconductors
Information Technology	Software and Services	IT Services	Internet Services and Infrastructure
Information Technology	Software and Services	Software	Application Software
Information Technology	Software and Services	Software	Systems Software

What I'm trying to achieve is that if I select a level of Industry (e.g. Level 1: 'Information Technology' or Level 4: 'Application Software') and as a return I see the transactions containing the industry classification somewhere in their string for industry classification.

Let's put it another way, I want to create a scatter plot (x: Transaction Value (Sum), y: Transactions (Count)) for each level of Industries, such that the bubble 'Information Technology' contains all transactions where the string 'Information Technology' can be found in the field 'Industry Classifation.

I hope I was able to describe my Problem 🙂 I would really appreciate some suggestions/help on this one! Thanks so much in advance.

Best

Vegar · ‎2019-02-04

Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row.

It looks like you did the %IndustryClassifications twice, it will cause your script to faile due to duplicate field names.

Try the script below.

Classification:
LOAD
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

Transaction:
LOAD
"CIQ Transaction ID",  // Unique Transaction ID,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

Vegar
Qlik Community MVP

View solution in original post

marcus_sommer · ‎2019-02-04

You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:

AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification

or using a preceeding load like:

load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...

- Marcus

View solution in original post

Vegar · ‎2019-02-04

I assume that Industry classifications is a field in a table where you do got an [M&A Transaction ID]. Try this.

Transactions:
LOAD 
[M&A Transaction ID],
Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications,
Field1,
Field3,
etc
FROM [YourSource];

Classifications:
LOAD 
Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classificatios [Target/Issuer]", ';') as Classification
FROM
[YourSource]

The Classifcations table you can manipulate with a common classification techniques such ass ApplyMap() or JOIN as you only have one value per row. (But many Classifcation rows per Transactions row)

Vegar
Qlik Community MVP

philipp_meissne · ‎2019-02-04

Hi Vegar! Thanks a lot for the quick reply and the the suggestion.

Unfortunately, I'mnot able to follow you completely....

Do you mind explaining to me, what the Autonumberhash (I guess you ment to type 256 instead of 265?) function exactly does here?

Also I'm not sure how to implement your suggestion. Now it looks like this:

LOAD
....
"CIQ Transaction ID", // Unique Transaction ID,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification,
...
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

However, the script is not running. Am I missing something or did I completely lose the point here?

marcus_sommer · ‎2019-02-04

The autonumberhash256() is used to reduce the resources for storing the KEY between your fact-table and an extra dimension-table (with n records per KEY which subfield() as an internal loop creates). This means the essential suggestion is creating two tables and associate them because it couldn't be done within a single table.

- Marcus

Vegar · ‎2019-02-04

Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row.

It looks like you did the %IndustryClassifications twice, it will cause your script to faile due to duplicate field names.

Try the script below.

Classification:
LOAD
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

Transaction:
LOAD
"CIQ Transaction ID",  // Unique Transaction ID,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);

Vegar
Qlik Community MVP

philipp_meissne · ‎2019-02-04

Alright, thanks for the explanation. I got the script running, which brought me one step closer to my solution 🙂 Now every classification is extracted and I can draw Sums and Counts for it, so the Scatter is possible! Many thanks for that!

However, I have one small issue left: Some Classifications are labeled with a '(Primary)' at the end. I do not care about it and like to drop it. Such that 'Software' and 'Software (Primary)' are counted as 'Software' only. I tried

	AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
	subfield("Industry Classifications [Target/Issuer]", ';') as Classification,
    Replace(Classification,' (Primary)','')

but the script won't run, giving me the error message 'Field 'Classification' not found'.

Any suggestions for a clueless guy? 🙂

marcus_sommer · ‎2019-02-04

You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:

AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification

or using a preceeding load like:

load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...

- Marcus

philipp_meissne · ‎2019-02-04

Yes, worked just great! Many thanks for the support guys 🙂

Vegar · ‎2019-02-04

Glad it worked out for you.

Vegar
Qlik Community MVP

Extract strings within a longer string

search string in string

sum if sting