Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear Community,
I searched the forums but unfortunately I could not find a solution to my problem. I try to describe it as good as possible:
I have a list of M&A Transactions, including the industry classifiactions of the targets, containing different levels of the different industries the target is active in. Entries might look like this:
Industry Classifications [Target/Issuer] |
Application Software; Information Technology; Software; Software and Services; Information Technology (IT) Consulting; IT Consulting and Other Services; IT Services |
Commercial and Professional Services; Industrials; Professional Services; Research and Consulting Services |
For mapping, I have a complete list of the different industries on different levels (extract):
Industry - Level 1 | Industry - Level 2 | Industry - Level 3 | Industry - Level 4 |
Information Technology | Semiconductors and Semiconductor Equipment | Semiconductors and Semiconductor Equipment | Semiconductor Equipment |
Information Technology | Semiconductors and Semiconductor Equipment | Semiconductors and Semiconductor Equipment | Semiconductors |
Information Technology | Software and Services | IT Services | Internet Services and Infrastructure |
Information Technology | Software and Services | Software | Application Software |
Information Technology | Software and Services | Software | Systems Software |
What I'm trying to achieve is that if I select a level of Industry (e.g. Level 1: 'Information Technology' or Level 4: 'Application Software') and as a return I see the transactions containing the industry classification somewhere in their string for industry classification.
Let's put it another way, I want to create a scatter plot (x: Transaction Value (Sum), y: Transactions (Count)) for each level of Industries, such that the bubble 'Information Technology' contains all transactions where the string 'Information Technology' can be found in the field 'Industry Classifation.
I hope I was able to describe my Problem 🙂 I would really appreciate some suggestions/help on this one! Thanks so much in advance.
Best
Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row.
It looks like you did the %IndustryClassifications twice, it will cause your script to faile due to duplicate field names.
Try the script below.
Classification:
LOAD Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classifications [Target/Issuer]", ';') as Classification FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);
Transaction:
LOAD "CIQ Transaction ID", // Unique Transaction ID, Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);
You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification
or using a preceeding load like:
load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...
- Marcus
I assume that Industry classifications is a field in a table where you do got an [M&A Transaction ID]. Try this.
Transactions:
LOAD [M&A Transaction ID], Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications, Field1, Field3, etc FROM [YourSource];
Classifications: LOAD Autonumberhash265("Industry Classificatios [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classificatios [Target/Issuer]", ';') as Classification FROM [YourSource]
The Classifcations table you can manipulate with a common classification techniques such ass ApplyMap() or JOIN as you only have one value per row. (But many Classifcation rows per Transactions row)
Hi Vegar! Thanks a lot for the quick reply and the the suggestion.
Unfortunately, I'mnot able to follow you completely....
Do you mind explaining to me, what the Autonumberhash (I guess you ment to type 256 instead of 265?) function exactly does here?
Also I'm not sure how to implement your suggestion. Now it looks like this:
LOAD
....
"CIQ Transaction ID", // Unique Transaction ID,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification,
...
FROM [lib://AttachedFiles/xxx]
(ooxml, embedded labels, header is 1 lines, table is Screening);
However, the script is not running. Am I missing something or did I completely lose the point here?
The autonumberhash256() is used to reduce the resources for storing the KEY between your fact-table and an extra dimension-table (with n records per KEY which subfield() as an internal loop creates). This means the essential suggestion is creating two tables and associate them because it couldn't be done within a single table.
- Marcus
Yes you are correct. it was a typo. The key thing to my solution is the subfield part that transforms a single semi-colon divided list to multiple rows with one semicolon entry per row.
It looks like you did the %IndustryClassifications twice, it will cause your script to faile due to duplicate field names.
Try the script below.
Classification:
LOAD Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classifications [Target/Issuer]", ';') as Classification FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);
Transaction:
LOAD "CIQ Transaction ID", // Unique Transaction ID, Autonumberhash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications FROM [lib://AttachedFiles/xxx] (ooxml, embedded labels, header is 1 lines, table is Screening);
Alright, thanks for the explanation. I got the script running, which brought me one step closer to my solution 🙂 Now every classification is extracted and I can draw Sums and Counts for it, so the Scatter is possible! Many thanks for that!
However, I have one small issue left: Some Classifications are labeled with a '(Primary)' at the end. I do not care about it and like to drop it. Such that 'Software' and 'Software (Primary)' are counted as 'Software' only. I tried
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield("Industry Classifications [Target/Issuer]", ';') as Classification, Replace(Classification,' (Primary)','')
but the script won't run, giving me the error message 'Field 'Classification' not found'.
Any suggestions for a clueless guy? 🙂
You could only access fields which already exists in the data-source and not ones which you create in the load. One way would be to include it in the subfield-logic like:
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications, subfield(Replace("Industry Classifications [Target/Issuer]"' (Primary)',''), ';') as Classification
or using a preceeding load like:
load %IndustryClassifications, Replace(Classification,' (Primary)','') as Classification;
load
AutoNumberHash256("Industry Classifications [Target/Issuer]") as %IndustryClassifications,
subfield("Industry Classifications [Target/Issuer]", ';') as Classification
from ...
- Marcus
Yes, worked just great! Many thanks for the support guys 🙂